mirror of
https://github.com/fosrl/pangolin.git
synced 2026-05-06 20:59:07 -05:00
[GH-ISSUE #2120] Pangolin leaking memory after upgrading to 1.13.1 #8839
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Ragnaruk on GitHub (Dec 18, 2025).
Original GitHub issue: https://github.com/fosrl/pangolin/issues/2120
Originally assigned to: @oschwartz10612, @miloschwartz on GitHub.
Describe the Bug
Pangolin seems to be leaking memory after upgrading from 1.10.2(?) to 1.13.1.
Environment
To Reproduce
~8k allowed and ~2k blocked requests a day.
Expected Behavior
Memory usage stays constant.
@Ragnaruk commented on GitHub (Dec 18, 2025):
Not sure if the cache is the problem, but you should consider periodically logging its stats.
Also, there's no real need to use cache here if MaxMind is not enabled. Plus, using a separate cache (maybe a LRU TTL one?) and limiting the number of keys also sounds like a good idea.
@nath1416 commented on GitHub (Dec 19, 2025):
I also have this problem, after setting up
GeoLite2-Countrydatabase.My 1GIB vps with 1GIB of swap runs out of memory and docker get kills by the OOM killer.
Here a screenshot of cpu usage, wich spiked to 100% when the containers get kiled.
Using:
@oschwartz10612 commented on GitHub (Dec 19, 2025):
Hum thats interesting did this only happen after adding the country database @nath1416 ?
@nath1416 commented on GitHub (Dec 19, 2025):
Yes, I upgraded to 1.13.1 and added the country database at the same time.
After I have memory leaks.
Currently I have 298985 total request and 7373 blocked.
Did not find anything interesting in the logs, but will check if it happens again and try to provide them here.
@djcrafts commented on GitHub (Dec 21, 2025):
I've opened PR #2133 that should fix this memory leak.
What was changed:
Added maxKeys: 10000 limit to the cache to prevent unbounded growth (uses LRU eviction)
Skip caching when GeoIP/ASN lookups return undefined (e.g., when MaxMind isn't configured)
Added cache stats logging every 5 minutes for monitoring
The cache was growing without limits - especially problematic with GeoIP enabled since every unique IP gets cached. The 10k key limit should be plenty for normal traffic while preventing OOM issues.
@nath1416 @Ragnaruk Would appreciate if you could test this when you get a chance, since you're both experiencing the issue.
@oschwartz10612 commented on GitHub (Dec 22, 2025):
Please reopen if still an issue on 1.14+
@Ragnaruk commented on GitHub (Dec 22, 2025):
Version: 1.14.0-rc.0.
I debugged the process and researched the problem for a bit, and I am ~90% certain that it's not a memory leak but memory fragmentation.
Unfortunately, switching to jemalloc didn't seem to help much.
Fow now, I've just set a memory limit and am accepting periodic restarts.
@oschwartz10612 commented on GitHub (Dec 22, 2025):
Hum interesting. So limiting the cache must not have helped.
What is the memory footprint when it gets restarted?
@Ragnaruk commented on GitHub (Dec 22, 2025):
Fresh process: 296 MB.
Upper limit: >1.5 GB.
@oschwartz10612 commented on GitHub (Dec 22, 2025):
It might need more memory than that but if it keeps increasing
indefinitely then thats a problem. This is a nodejs applications so I
would expect high base memory load and for it to fluctuate as it garbage
collects and stuff like that.
@Ragnaruk commented on GitHub (Dec 22, 2025):
pmap returns the following, so I'm pretty sure it's musl allocations rather than JS.
@jjeuriss commented on GitHub (Dec 22, 2025):
I'm able to reproduce this too, by scrolling through my photos on my NAS (which are accessed through pangolin). This seems to heavily increase memory usages, mainly on pangolin and eventually completely lock up my VPS. On this 1GB VPS, Pangolin was running smooth for a few months before version 1.13.1 I upgraded from 1.11, via 1.12 but didn't use 1.12 for a long time. I'm planning to export a logs of memory usage and a CSV to see the metrics. Unfortunately, running a Prometheus client is pushing it most likely, so I'm logging "docker stats --no-stream" to a file periodically to get an idea of the memory pressure.
If I can help in any other way, please let me know.
@jjeuriss commented on GitHub (Dec 24, 2025):
My problems are not over yet with 1.14.0.
Is there a way to downgrade to 1.12 or will the migrated databases not allow that?
@joerg-hro commented on GitHub (Dec 24, 2025):
I have the same problem. I had a backup of the config folder (1.10.1). I restored it. Then I installed the image (1.10.1). It's working again.
@oschwartz10612 commented on GitHub (Dec 24, 2025):
@jjeuriss I would be curious is this is the pangolin container or the Traefik container. Are you able to profile the containers and reproduce?
@jjeuriss commented on GitHub (Dec 25, 2025):
@oschwartz10612 Although I frequently run into the problem, I fear my 'reproduction' scenario isn't actually helping to reproduce the problem.
Had to do manual reboots of my VPS to recover:
2025-12-24 14:07
2025-12-25 06:07
2025-12-25 07:53
I am keeping the statistics on memory and CPU and plotted one period where my VPS got stuck after only 2 hours (on 2025-12-25 07:53) and plotted this in an Excel. Tbh I don't see anything really weird w.r.t. memory usage. I see CPU spikes after the docker starts up, that seems normal...
Excel and full CSV attached.
I've also attached the pangolin docker logs, which shows some errors.
pangolin.log
memory-metrics-chart-2025-12-25-07-53-00-v3.xlsx
memory-log-test.zip
Pangolin

Traefik
The dips/spikes in the graph are where my VPS restarts (it completely locks up, can't even SSH into it anymore, so I have to reboot it).
These graphs are IMHO inconclusive. I don't know why it happens, only that it happens on 1.13 and didn't use to happen on 1.11.
I'm not sure exactly how to profile a docker container. Is there a guide or set of commands you want me to run?
@Josh-Voyles commented on GitHub (Dec 25, 2025):
I'm also experiencing issues on my AWS instance. After running for a while, Pangolin locks up. The only service I can access is Authentik which is not protected. I'm running 14.1.
@asardaes commented on GitHub (Dec 26, 2025):
I didn't experience OOM kills as such, but I did report abnormal IO in #2134 (which also led to the VM freezing entirely and needing a reboot). I eventually figured out it was memory pressure forcing memory pages to disk and immediately reading them again in some endless death loop. What worked for me was to enable zram swap as described in the arch wiki - I configured 512M for my 1G VPS, and since actual usage gets compressed, I think my memory usage actually went down. I also applied the optimizations mentioned in section 2.4 from that wiki except for
vm.watermark_boost_factor; from what I could gather that might not really help for a VM with little memory, so I left it at the default of 10. I've had zero issues since, although I don't have a lot of requests going through the VPS, so YMMV.@nath1416 commented on GitHub (Dec 26, 2025):
I still have the same issue, just restarted my instance after 3 days, the swap was full.
I will try to set up better monitoring this week.
@koenieee commented on GitHub (Dec 26, 2025):
Same issue here. I have to reboot the whole vps machine with 1 gb of ram every day now.
@jjeuriss commented on GitHub (Dec 26, 2025):
I'll try version 1.12.3 to see if this also existed there. (I know it ran fine on 1.11.1, but don't know yet whether it regressed in 1.12 or in 1.13). I'll keep the 3rd party components (traefik and gerbil) the same latest version.
Interestingly, the base memory usage of pangolin dropped from about 280MB to about 243MB.
@Josh-Voyles commented on GitHub (Dec 26, 2025):
Last night, I upgraded my t2.micro instance to a t3a.small instance to see if it just needed more memory. However, I can see now, that over time, the docker container memory usage is gradually increased. It takes about 5 hours, but then my instance become unusable and CPU hits 100 percent. In the attached image from docker stats, you can see the abnormally high memory usage. The other two services aren't visible because things started to glitch before loosing connection. I'm happy to investigate further if needed.
@laugmanuel commented on GitHub (Dec 26, 2025):
I can confirm this as well.
I've attached screenshots of memory usage of the three components of the last 24h.
I've upgraded to 1.14.0 on the 23.12. and to 1.14.1 on 24.12.
I've also attached a pangolin memory usage graph for the last 7 days and it looks like for me it started happening on the 24.12. in the evening. This seems to suggest that is has to do with that release (1.14.1)...
Is there a way to dump and analyse node memory footprint to find the cause?
@Josh-Voyles commented on GitHub (Dec 26, 2025):
As others have mentioned, happy to help as much as I can.
But for now, I set up a CloudWatch alarm on AWS when my CPU exceeds 60 percent average for 1 minute, it sends me an email and reboots my instance.
Also, more investigation is revealing I seemed to have started having issues right around the time I updated my Newt instances to 1.8 from 1.6.
Note: I'm updating Newt to 1.8.1 and will monitor if anything changes.
@SamTV12345 commented on GitHub (Dec 26, 2025):
I'm idling at around 250 MB of memory usage. Can that also limit the performance? Through a 1Gb fiber connection I'm only getting through pangolin with a speedtest 50-60 MB. Wasn't it possible to build the server in Go? It has quite a lower memory and cpu footprint than Node.
@laugmanuel commented on GitHub (Dec 26, 2025):
I've limited the memory consumption via docker memory limits to 769M (arbitrary value). This seems to help my setup. The container is still leaking memory up until the limit, but keeps stable afterwards and is not getting killed.
(see stable line at the end at ~20:40)
@Josh-Voyles commented on GitHub (Dec 26, 2025):
@laugmanuel This looks like it could be helpful. When you hit your memory limit, are you experiencing any performance impact to Pangolin?
@laugmanuel commented on GitHub (Dec 27, 2025):
I'm not really running any performance sensitive applications over Pangolin, but so far it seems to work just fine.
Edit: after some time, Pangolin gets killed for me - however, it's far better than exhausting the entirety of VM memory...
@laugmanuel commented on GitHub (Dec 28, 2025):
I did some more digging using Node inspection tooling (
--inspect+ port-forward + Chrome DevTools) and checking memory allocations over time.To me it looks like the memory usage only increases by denied requests. The screenshot below shows this. The part between the red lines is caused by unauthorized requests. The same amount of requests with valid auth shows almost no allocations (green):
I'm struggling to export the memory snapshot, so I can't look into that right now. If someone has some ideas on how to investigate further, I'm more than happy to help.
@koenieee commented on GitHub (Dec 28, 2025):
I have tried your fix, but it still keeps getting killed. Thanks for the tip, how can we limit the memory usage better?
@laugmanuel commented on GitHub (Dec 28, 2025):
It was not thought of as a fix, but as a temporary workaround to limit the impact of Pangolin to itself and not impact other ressources on the same host.
One general question: do you guys, by any chance, use some sort of monitoring tool to check Pangolin itself or a ressource protected by it? I've noticed, that Pangolin is returning HTTP 200 even if the ressource is protected and the request is unauthorized. The reason is that the client is redirected to the Pangolin auth page which is served successfully. In my case, it looks like a major contributor was my monitoring..
However, even if that is the case, it's still a potential DoS if denied requests leak memory...
@Josh-Voyles commented on GitHub (Dec 28, 2025):
If you're getting 200 for protected resources that seems odd. I'm getting code 302 which is redirect.
@sambilbow commented on GitHub (Dec 28, 2025):
Yes. I use Gatus and get 200 on proxied private resources. But it might be following the redirect I guess?
@laugmanuel commented on GitHub (Dec 28, 2025):
If the monitoring is following the redirect, a 200 is expected as it's the result of the Pangolin auth page. However, this redirect still causes a
No Valid Authevent in Pangolin and result in my observed behaviour above (memory allocations).@nath1416 commented on GitHub (Dec 29, 2025):
This would make sense, I do have a misconfigured gatus health check that result in multiple
Deniedevents. This could be related. I will turn off the endpoint in gatus and check if it still crashes.I tried your suggestion @laugmanuel to limit the ram usage for the Pangolin container. It resulted in a restart of the container after that it working fine so far.
@oschwartz10612 commented on GitHub (Dec 31, 2025):
Could anyone confirm if turning off the request logs fixes the memory
problem?
@oschwartz10612 commented on GitHub (Dec 31, 2025):
Also interested if anyone could turn on debug logs and watch the cache
print statements to see if we are building memeory in there. I dont
think so but would like to check. I think this is in 1.14.
@Josh-Voyles commented on GitHub (Jan 1, 2026):
I've turned request logs off and will report back tomorrow.
@kazooie13 commented on GitHub (Jan 1, 2026):
Same problem for me after updating: memory runs full over time, then Pangolin locks up and shortly after that the VPS (very limited – 1 GB RAM) crashes. Pangolin idles at around 350 MB. I use WebDAV over Pangolin in combination with path rules, so there are many requests. Unfortunately, I can’t tell if the problem existed before the upgrade, because I added the path rules shortly after the upgrade. I’m trying to turn off the request logs. So far the consumption is still relatively high, and I can’t yet tell whether it will continue to increase and crash.
@Josh-Voyles commented on GitHub (Jan 1, 2026):
Turning off request logs does not solve the issue.
@oschwartz10612 commented on GitHub (Jan 1, 2026):
Can anyone pinpoint if its the traefik container or pangolin or gerbil?
Are we 100% sure its the pangolin container? Just want to be sure
because I know there were previous issues with Traefik running away with
memory.
@Joly0 commented on GitHub (Jan 1, 2026):
Not sure about the others, but when I look on my server with htop, the process that's consuming currently about 15 GB (out of 16 on my server) is "node -enable-source-maps dist/server.mjs".
I am not sure what process exactly this is or which container is running this, just maybe someone else can tell that.
Edit: Use
docker exec pangolin ps atog et all the processes in the pangolin container and the mentioned node process is the main thread in the pangolin container.So this error seems to come pretty surely from pangolin itself and not traefik or gerbil
@sambilbow commented on GitHub (Jan 1, 2026):
Same as above. I posted some images of usage within container on Discord
@kazooie13 commented on GitHub (Jan 2, 2026):
I can confirm that "I‑enable‑source‑maps dist/server.mjs" is the affected process and that it originates from Pangolin itself.
Also, disabling request logs does not resolve the issue.
@laugmanuel commented on GitHub (Jan 2, 2026):
I can also confirm, that it's the pangolin container itself:
I didn't see any more problems after fixing the monitoring check (mentioned above). I reverted that fix and enabled debug logging for Pangolin to see cache growth - however, the caching seems to work just fine and doesn't show any abnormalities:
@oschwartz10612 commented on GitHub (Jan 2, 2026):
Hum this is going to be a tough one. I will see about building the
container with --inspect in the node command to see if I can use the
heap inspector from chrome to see where memory is building up.
@0i5e4u commented on GitHub (Jan 2, 2026):
I disabled every non HTTPS Resources and the container seems to stay at a stable RAM consumtion.
Also non reachable Targets are disabled for testing. Can someone confirm this?
Running since 14h
root@ubuntu:~# docker ps | grep pangolin
37b8f017bf25 fosrl/pangolin:1.14.1 "docker-entrypoint.s…" 14 hours ago Up 14 hours (healthy)
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
9bc00849ddba traefik 0.08% 34.38MiB / 848.6MiB 4.05% 230MB / 232MB 523MB / 49.2kB 7
8e621351d8dd gerbil 0.00% 5.949MiB / 848.6MiB 0.70% 230MB / 232MB 58.7MB / 0B 7
37b8f017bf25 pangolin 2.09% 290.4MiB / 500MiB 58.09% 20.6MB / 137MB 652MB / 174MB 22
@buster39 commented on GitHub (Jan 3, 2026):
reverted back to 1.13.1 and deleted all of crowdsec in my setup - stable again since aprox 3 days.
@kazooie13 commented on GitHub (Jan 3, 2026):
I am also using crowdsec with a new postoverflow whitelist rule. I don’t know whether that could have an impact on the problem (i dont think so), and I haven’t tried it without yet.
@Josh-Voyles commented on GitHub (Jan 3, 2026):
Crowdsec was never part of my config and I still had issues. I am trying reverting back to 13.1. I still have requests logs off.
@kazooie13 commented on GitHub (Jan 3, 2026):
Is there any instruction on how to revert
to version 1.13.1? Can I simply adjust the Compose file, or will that cause conflicts with the database? I urgently need a stable system and a temporary fix for the problem.@joerg-hro commented on GitHub (Jan 3, 2026):
I had version 1.10.1 installed. Before updating to 1.13.1, I backed up the config-directory. Since I had the same problems with version 1.13.1, I adapted the Pangolin version of the docker-compose file to version 1.10. Before deploying the docker-compose.yaml file, I restored the config-directory. Now Pangolin is running perfectly again.
@jjeuriss commented on GitHub (Jan 3, 2026):
T.b.h. I don't understand why the last comments are about reverting to 1.13.1. The problem seems to have been introduced inversion 1.13.1 as per the OP, so you'd have to revert to something older to get rid of it (e.g. 1.11.1 or maybe 1.12.3 but nobody confirmed/denied yet that that 1.12 was stable)
AFAIK you can only revert to 1.11.1 if you took a backup of the whole config directory (and docker-compose.yml)
@buster39 commented on GitHub (Jan 3, 2026):
True - just told, what happened to me. Maybe we'll find a point where to dig.
@jjeuriss commented on GitHub (Jan 3, 2026):
@buster39 did you reinstall everything or revert to a backup?
@jjeuriss commented on GitHub (Jan 3, 2026):
By the way I noticed that the base memory usage of Pangolin (i.e. after running it for about 5min) on 1.11.1 is 230MB where-as this is 305MB on 1.14.1. Both without crowdsec. I hope the increased memory usage is expected.
@buster39 commented on GitHub (Jan 3, 2026):
just a little setup with the smallest VPS with 1 GB RAM - if i remember correctly, lock of my server started after upgrading to [1.13.1]. Server became unreachable a few hours after restarts.
Upgrading to 1.14 didn't help. Had some crowdsec related issues in the log. So decided to start with removing crowdsec from the setup.
Had no backup of the config - just changed the versions in compose yml and deleted everything related to crowdsec.
But - of course - had to pull the docker containers again for the different versions.
@jjeuriss commented on GitHub (Jan 3, 2026):
Ah yeah. I tried running an earlier version of Pangolin (1.10 or lower) on a VPS of 1GB with Crowdsec too. This simply locks up the VPS as there's not enough memory to run Pangolin + Crowdsec on a system with such lower memory. I don't think that's related to this bug though.
This bug describes that on 1.13+ the memory usage of Pangolin increases in certain scenario's due to a memory leak (which in turn can also lock up a VPS with only 1GB of memory).
@jjeuriss commented on GitHub (Jan 3, 2026):
@oschwartz10612 I still seem to be able to reproduce as follows with failed requests.
I've got photos.mydomain.com forwarded through Pangolin. Through that domain I can browse my photos on my Synology NAS, through a web browser without a problem. But, somehow my Synology Photos app on my phone which points to this same domain is not correctly working yet. So if I run the Photos app on my phone on my 4G network and it connects to my NAS through Pangolin, then the images are not showing up (probably failing somewhere). The fact that it's not working to view my photos is out of scope for this discussion. Just using this as repro scenario.
To make the memory shoot up (from 350MB to 395MB in 1-2minutes), I just need to browse on my 4G network on my phone, on the Synology Photos app (which then doesn't show any images at all). Doing this shoots up the memory and it doesn't seem to go down automatically any more. It also shoots up the CPU usage. The longer I scroll (i.e. issue failing requests), the more memory is consumed.
Note: occasionally I do see memory dropping again.
This repro scenario may be useful for debugging this, but I have no experience with inspecting or profiling containers. If you can let me know how to inspect that, I can further assist with that.
As a work-around for this memory leak problem, I've set a memory limit, fairly conservative (400MB in my case), which then auto-restarts the container, avoiding my VPS to lock-up. You can see the auto-restart of the container in the graph.
I'll now do 1 more test with setting the limit at 500MB and see whether memory usage goes down again over time.
Update: when my pangolin usage goes above 425MB, my 1GB VPS simply dies and I need to reboot it.
@AlexWhitehouse commented on GitHub (Jan 5, 2026):
I am on Pangolin v1.12.1 and am experiencing the same behaviour.
I'm getting a weird periodic CPU usage spike, alongside ever increasing memory consumption with the htop output showing the above process from the Pangolin container being the culprit.
@hansencheck24 commented on GitHub (Jan 6, 2026):
I'm on pangolin v1.14.1 and having the same issue. Currently I disabled the
maxmind_db_pathon config and disabled the log retention on settings.the 3 spikes on memory happens before I disabled both of it, and currently running with stable memory usage. hope this helps
@oschwartz10612 commented on GitHub (Jan 6, 2026):
Yes that is thank you! Can anyone else repro?
@kazooie13 commented on GitHub (Jan 6, 2026):
Unfortunately, that didn’t work for me. I’m still getting the same CPU spikes as mentioned above; disabling the path in config and deactivate the log retention didn’t made any difference.
(The problem already existed for me before I even added the country path).
@Josh-Voyles commented on GitHub (Jan 7, 2026):
A couple of findings:
I reverted from 14.1 to 13.1 with request logs still off and didn't have any issues for a couple of days.
Then, in 13.1, I tried turning request logs back on and had issues the same night.
I turned request logs off again (still running 13.1), and everything seems to be running fine.
I'm looking at my pangolin container after running for over 24 hours and memory still seems normal.
I'm running a t2.micro on AWS with 1 vCPU and 1GB of memory.
I hope this info helps track down the problem.
@jjeuriss commented on GitHub (Jan 7, 2026):
I developed an extremely lightweight Prometheus exporter docker to measure CPU, memory, PID and disk statistics of containers running on a host. This allows me to inspect the containers of the pangolin solution (pangolin+traefic+gerbil) on my 1GB VPS. Had to develop one, because typical solutions like
cadvisorwere consuming around 100MB which would even more quickly run my VPS out of memory. My custom exporter consumes about 10MB, so that's workable. If you're interested, you can find it here: https://github.com/jjeuriss/tiny-docker-exporter.I keep seeing the same thing: the Pangolin solution is stable until you start doing some extra monitoring (e.g. uptime-kuma) or letting it do requests it cannot fulfill (in my case that's scrolling through my photos on my Synology Photos app, which doesn't seem to work through Pangolin). Even my docker memory cap of 400MB doesn't help to make my solution stable: the docker doesn't seem to get killed in time; my VPS just hangs and SSH is impossible till I reboot it.
I tried my failure scenario (scrolling through photos it cannot access) once more and noticed a high peak in storage being read/written. That may point to some problem?
With these new graphs and way to reproduce, I'ld really like to try some things out.
I tried turning off requests logs through the Pangolin GUI, but that hasn't reduced the read/write throughput:
I don't have
GeoLite2-Countryset up, so I don't see how disablingmaxmind_db_pathcould help. How do I disable that exactly by the way?Is there any way I can see what Pangolin is doing reading and writing this much?
@asardaes commented on GitHub (Jan 7, 2026):
@jjeuriss for disk IO, see my comment above, that's the exact issue I had, it was memory pressure at the kernel level, which led to using the boot disk as a kind of swap even without a swap partition configured, which killed the whole VM hard.
@jjeuriss commented on GitHub (Jan 8, 2026):
Nice, thanks, @asardaes ! That WA actually works. I cannot seem to crash my Pangolin anymore with this workaround, and even with ZRAM filled up, it seems to stay responsive.
After scrolling a long time in my (disfunctional) photos, I was still able to fill up the ZRAM. Ideally at some point Pangolin evicts the data in it again. I'll monitor that. I assume the memory leak described here is exactly about that, and the ZRAM now delays the point of failure.
These were the commands I used to enable ZRAM (as @asardaes advised). Just put these in copy-paste ready format for future reference.
Stop Docker containers:
Install zram-tools:
Configure zram (512MB):
Restart zramswap:
Apply VM tuning:
Restart Docker containers:
Verify:
If you want to revert this you can execute
Stop Docker containers
Stop and disable zramswap
Uninstall zram-tools
Remove VM tuning parameters from
/etc/sysctl.conf: Remove the lines added (vm.swappiness, vm.vfs_cache_pressure, vm.overcommit_memory)Apply sysctl changes to reload defaults
Restart Docker containers
Verify the revert
@jjeuriss commented on GitHub (Jan 9, 2026):
I've enabled debug logs and added some extra prints to check memory usage.
When my photos app has SSO enabled (and thus needs to go through an extra authentication step), it results in a flood of unauthenticated requests. These unauthenticated requests seem to massively increase heap memory.
When I turn off SSO authentication, my app correctly shows my images and the heap memory stays constant!
Clearly memory is being leaked for unauthenticated requests, even with request logs disabled in the GUI.
I'm working on a fix!
@Josh-Voyles commented on GitHub (Jan 9, 2026):
I'll just be hanging on 13.1 with request logs off until a fix is released. Stable for a few days now.
@jjeuriss commented on GitHub (Jan 10, 2026):
Did a few more tests to see where the high disk I/O problems started. Because I think those are at the root cause of the memory leak. Sure, the ZRAM workaround of @asardaes helps to avoid them, but hide the problem: they then build us as zram swap.
I repeated the same reproduction scenario (scrolling through photos that are each getting an unauthenticated error) on each of the Pangolin versions:
Clearly the problem started at 1.13.0 (and remains on 1.14.1 by the way), so the diff from 1.12.3 to 1.13.0 should reveal it.
Now, I still need to figure out what's causing it... Already tried a couple of things on my fork, but so far, no luck.
I'm thinking now it might be related to the analytics that were added.
@Yonoesio commented on GitHub (Jan 13, 2026):
Investigation: Memory Leak Management in Pangolin (Virtualized Environment with OPNsense/ICMP)
⚠️ Disclaimer and Methodology
This document is the result of experimental research. The author (user) states that they do not possess the deep technical knowledge of systems engineering required to resolve the underlying root cause within the Pangolin source code.
The resolution of this problem was achieved through a process of trial and error based strictly on tests performed in a production environment, with the support of Gemini AI for diagnosis, technical structuring of solutions, and the generation of this English translation. The methods described here are "configuration patches" to ensure service availability, not a fix for the original software's code.
1. The Problem: The Memory Leak
During monitoring with
htopanddocker stats, an uncontrolled growth of the RES (Resident Set Size) memory of the Pangolin process was detected.2. Testing and Diagnosis
Stress tests were conducted focusing on traffic persistence:
3. The Solution: The Docker "Cage" (Hard Limits)
Since correcting the memory leak in the source code was not an option, a "self-cleaning" mechanism for the container was implemented.
It was discovered that the standard
deploy: resourcesblock in Docker Compose does not effectively stop SWAP usage in standalone environments. Therefore, a configuration patch was applied using Host directives to strictly "enclose" the process.Configuration Patch (
docker-compose.yml):YAML
4. Comparative Results
After applying the limits and observing the system, the results are as follows:
5. Technical Conclusion
The implemented solution acts as a safety circuit breaker. Upon reaching the 1.8 GB limit, the Linux Kernel (via Docker) executes an
OOM Killon the container. Thanks to therestart: alwayspolicy, the container restarts in less than 3 seconds with clean memory (400 MB), preventing the VPS from collapsing.This method ensures that, despite the memory leak, the 15 VMs maintain 99% uptime without manual intervention.
@jjeuriss commented on GitHub (Jan 13, 2026):
Yeah, limiting memory helps in some cases, @Yonoesio as was mentioned earlier in this thread by @Ragnaruk in https://github.com/fosrl/pangolin/issues/2120#issuecomment-3683502087 . In more extreme cases (e.g. VPS with low memory and high amount of unauthenticated requests), it doesn't help to avoid the VPS from hanging.
@jjeuriss commented on GitHub (Jan 13, 2026):
Still haven't found the root cause of the leak. Help is welcome.
My VPS is also a bit too small to test this on properly, because whenever memory hits about 450MB, my VPS already hangs. So I have about 100MB of RAM to play with from the ~350MB it starts up with.
@oschwartz10612 commented on GitHub (Jan 14, 2026):
Appreciating all of the feedback everyone. We are going to put some real effort into this before 1.15 to see if we can resolve it. Worried its DEEP 😅
@Yonoesio commented on GitHub (Jan 16, 2026):
Key findings on Memory Stability and Storage Drivers (Assisted by Gemini AI)
**Technical Disclaimer:**This report was structured and translated with the assistance of Gemini AI. The user (author) performed the empirical testing and environmental changes but does not claim deep expertise in systems engineering. The findings below are based on recent, real-world observations.
Recent Findings (Casual Discovery): I would like to share a significant and somewhat casual discovery regarding the memory leak reported in this issue. After experiencing constant system crashes on a 4GB RAM VPS, I performed a clean migration of my stack. I cannot strictly confirm if there is a direct technical correlation, but the change in stability has been spectacular.
containerdsetup to Docker with theoverlay2storage driver.deployblock to ensure enforcement):Real-time Observations: While I am waiting for more time to pass to generate a complete usage graph, the visual evidence is clear. I am now seeing active memory releases every 10 to 15 minutes, a behavior that was non-existent before.
Conclusion: I don't have the technical background to explain why, but switching to the Docker Storage Driver (
overlay2) combined with hard limits has transformed a broken system into a stable one. Previously, memory grew linearly until a total host crash. Now, the Node.js Garbage Collector seems to be functioning correctly in a "sawtooth" pattern.I will provide a full graph once it's completed, but I wanted to share this "spectacular" improvement immediately as it might provide a clue to the developers or relief to other users.
@Josh-Voyles commented on GitHub (Jan 18, 2026):
Quick update: In 13.1, even with logs off, I eventually had problems again; it just took a whole week to manifest.
@rex1234 commented on GitHub (Jan 18, 2026):
No settings mentioned here helped me get rid off this memory leak. The node process starts eating my memory until pangolin restarts which causes all sites to be unavailable for a while multiple times per day. This bug should get full attention finally, it's opened for more than a month and availability of all running services suffers from it.
@jjeuriss commented on GitHub (Jan 22, 2026):
I agree, this is IMHO the top priority bug.
I can't use v1.14.1 or higher for more than half a day due to this bug. I've already tried a few things on my fork of this project, but none have resolved it so far.
Things I know so far:
** DISABLE_AUDIT_LOGGING=true → Issue still reproduced (not audit logging)
** DISABLE_GEOIP_LOOKUP + DISABLE_ASN_LOOKUP → Issue still reproduced (not geo/ASN lookups)
** DISABLE_RULES_CHECK=true → Issue still reproduced (not rules check)
** DISABLE_SESSION_QUERIES=true → Issue NOT reproduced, but this breaks auth entirely (not a viable fix)
Attempted fixes (all failed to resolve the issue):
reduction in database queries and kept them fast, but the memory still grew, I/O spikes still occurred, and VPS still froze
prevented the issue.
The problem remains 100% reproducible and makes v1.13.0+ unusable in production with high volumes of unauthenticated requests.
At this point I need help from the core team to identify what changed in v1.13.0 that could cause this. I've ruled out the suspects I could think off and am stuck.
@Vangreen commented on GitHub (Jan 26, 2026):
For me 1.15 version fix problem.
Before there was 1-2 restart per day. Now it runs for 2 days without high ram usage
@oschwartz10612 commented on GitHub (Jan 26, 2026):
Good to know @Vangreen thank you! I forgot to update this thread but we made some improvements in 1.15. Could everyone try it out and let me know if the issue still persists?
@n1LWeb commented on GitHub (Jan 27, 2026):
For me the issue still persists in 1.15.1 and the process locks up way before the 24GB Memory of my VPS is filled. My limit is now set at 1000MB and the restart happens about every 2 hours.
The limit is set in docker to prevent pangolin from growing more and locking up eventually.
@Vangreen commented on GitHub (Jan 27, 2026):
@n1LWeb do you have health check for resources set up? I notice when I have setup status monitoring for my resources this behavior with high ram usage begins.
@n1LWeb commented on GitHub (Jan 27, 2026):
@Vangreen Yes for almost all ressources I'm using multiple targets per resource. At the moment I'm just testing with 2 newt connections over the same internet connection, but soon the 2 newt connections will be routed via 2 different internet connections (DSL/fibre). Then I'll need the health checks so pangolin will not route over a failing connection.
@rex1234 commented on GitHub (Jan 27, 2026):
I can confirm that issue is caused by unathenticated requests that some of my services were doing, after I fixed it and all are properly authorized the memory leak seems to be fixed.
@jjeuriss commented on GitHub (Jan 27, 2026):
I’m still seeing the same issues in 1.15.1. This thread already points out that unauthenticated failures are at the root of the problem. Note that unauthenticated requests aren’t only caused by monitoring; they’re also triggered by bots crawling a domain and trying every available page. Monitoring does make it worse, but it isn’t the sole cause.
The underlying unauthenticated-request problem does not appear to be fixed. In 1.12.3, these failures do not result in significant I/O usage or memory spikes. However, starting with 1.13.0 and continuing through 1.15.1, they clearly do.
After sending a burst of unauthenticated requests (~500–1000), I observed a massive spike in read I/O on my VPS.
A few hours later, the system completely hung after a second spike occurred (not manually triggered, probably a scan).
These kinds of high I/O peaks don't occur on 1.12.3 (I used that version again for the last 4 days and saw no issues). Going back to it now. Looking forward to a fix for this issue still!
@SamTV12345 commented on GitHub (Jan 28, 2026):
Same issue for me. With 1.15.1 my 1GB vm is OOM after less than 2 days.
@Boscovitz commented on GitHub (Jan 28, 2026):
Same here with 1GB. Every 2-3 days and I have to restart the vps because it hangs OOM.
@oschwartz10612 commented on GitHub (Jan 30, 2026):
Hum wonder if its a dependency... Will experiment
@formless63 commented on GitHub (Feb 4, 2026):
Happening to me as well. 2GB VPS goes OOM every 24-36 hours.
Edit: can confirm this continued when bumping to 1.15.2 as well. Potentially even happening faster now.
@maiestro commented on GitHub (Feb 12, 2026):
Hello,
I wanted to ask if there is any news regarding this issue? I am currently using Pangolin v.1.11.1, which is essentially the last version where the memory issue has not occurred.
I am grateful for any information.
Best regards
@ghostklart commented on GitHub (Feb 21, 2026):
Hello, I'm actually started to have the same issue on 1.15.4
Will try to reverse back to 1.12.3
@N3m351x commented on GitHub (Feb 24, 2026):
Did the downgrade fixed your issue?
@ChrissiBe commented on GitHub (Feb 24, 2026):
Same issue on my VPS with 1GB memory. After 1-2 days the system ran out of memory.No login possible.
Downgrade to 1.13.0 works with no problems . I tried with debian and ubuntu minimal configuration but same problem.
1.13.0 works, 1.13.1 and newer run out of memory
@maiestro commented on GitHub (Feb 24, 2026):
I also have a VPS with 1GB RAM. The problem occurred after about 6 hours for me.
I had problems with the following configuration:
Pangolin v. 1.11.1
Gerbil v. 1.3.0
Treafik v. 3.6.7 (plugins: badger v. 1.3.1; geoblock: v. 0.3.6)
@ChrissiBe: Could you please also tell me your (currently working) version numbers? I would then like to try exactly the same setup on my VPS.
The version numbers for Pangolin, Gerbil, and Treafik are located in
{DOCKERINSTALLPATH}/config.ymland for the Treafik plugins under:
{DOCKERINSTALLPATH}/traefik/traefik_config.ymlin the experimental->plugins section.EDIT: Pangolin Version
@kazooie13 commented on GitHub (Feb 24, 2026):
Could you please provide us with an update of the current status of the issue?
It has been open for over two months now and has become the issue with the most comments/reports.
If no more resources are being allocated to resolving it, I will need to look for an alternative.
@ChrissiBe commented on GitHub (Feb 24, 2026):
@maiestro
Pangolin 1.13.0
Gerbil 1.3.0
Traeffik3.6.7
I testet all of the pangolin Version up to 1.15.4
All Version have this out of memory problem. I did a fresh installation with ubuntu (22 and 24) or debian (12 and 13)
Only 1.13.0 and older are working
now I installaled with the quick-setup script, docker compose down, in the .yml set the pangolin version to 1.13.0 ,
docker compose pull , docker compose up -d .
@huzky-v commented on GitHub (Feb 25, 2026):
Not sure if it helps, but I tried downgrading the zod dependencies to v3 and update the code to adapt zod/v3 with
codex(so use it with your own risk, and some schema definition may not be accurate)The downgrade is based on
1.15.4codebaseObserved with
--inspectthat the heap usage is lower when starting pangolin stackWonder if anyone test that for real traffic as I don't have that VPS size and traffic
The POC branch is here: https://github.com/huzky-v/pangolin/tree/zod-v4-to-v3, you may check the code and build the Docker image (or you can just use docker.io/xerial817/pangolin-zod-poc:latest if you are in yolo mode)
And also some unsolved case for somebody reporting a zod/v4 memory leak issue, https://github.com/colinhacks/zod/issues/5490
EDIT: seems not working 😫
@n1LWeb commented on GitHub (Feb 26, 2026):
@huzky-v I Tried and got the increasing memory usage on this version, too.
@duchu commented on GitHub (Feb 27, 2026):
On version 1.16.0, the problem still occurs.
@Josh-Voyles commented on GitHub (Feb 27, 2026):
Are we missing something? The release candidate said no known bugs. Do I need to scrap my install and rebuild from scratch? Use RHEL instead of Ubuntu? I'm happy to do whatever; I just need instructions.
Also, is it just this thread of people having issues? What are the other users doing that we aren't?
@formless63 commented on GitHub (Feb 27, 2026):
I get the feeling that there's been zero effort to investigate. I'm wondering if we all have something in common.
Personally, Ubuntu, running in docker compose, multiple newt sites, many popular selfhosted apps underneath. I added f2b on the VPS after this started as I initially assumed it was related to getting spammed with failed connection/auth attempts.
When digging and trying to resolve I made some notes:
@Josh-Voyles commented on GitHub (Feb 27, 2026):
I know there's been work by the devs and community, but it seems like it's not clear what's going on. I'm sure if the majority of users and their SaaS platform were having issues, all efforts would be focused on this. However, I'm not convinced that's the case.
So, that's why I'm asking what needs to change on my end.
@AlexWhitehouse commented on GitHub (Feb 27, 2026):
I was having the issue, I restarted the container having changed nothing and am no longer experiencing the issue. Unhelpful I know but shows it seems more of a race condition than something permanent.
@SamTV12345 commented on GitHub (Feb 27, 2026):
It still occurs for me. I "solved" the issue by adding a cron job that runs every midnight where my 1 GB VPS is restarted.
@huzky-v commented on GitHub (Feb 27, 2026):
I have tried some testing on my 1GB testing VPS, the command is
echo "GET https://resource.protected.ltd" | vegeta attack -duration=3600s --rate=10 | vegeta reportbasically sending 10 request / s to the target protected resources
Things I tried:
1.12.3, which isnode 22alpineimageHere are some of my observations during my debug, build, test loop:
1.12.3, the memory will still grow on unauth request and make thedocker statsnot able to return data1.12.3is around 200MB, and grows with the version progress.src/app/auth/resource/%5BresourceGuid%5D/page.tsx, which shows the auth page when not authenticated, the memory still growzodlibrary as stated above, it did cut some of the heap usage like 10%, thedocker statswill not hang that soon, but it will crash.string, i18n stuff, some zlib hanging around1.12.3is still ok because the base memory usage is low enough for the VPS to run, and there is headroom to GC. The memory leak? I think it still exists, but just too hard to locate, and just hard to find a minimal reproducible snippet for that.Dang. I am not affected by this case (I have a large memory VPS instance, and frankly not growing too much memory because of that) but I am literally out of idea
@n1LWeb commented on GitHub (Feb 27, 2026):
Some insights:
If I disable all health checks the memory usage is mostly stable. But I need them as most of my ressources are reachable over two different newt instances.
Maybe the people without the issue didn't enable health checks yet?
I switched from my 1GB x86 VPS to my 24GB ARM VPS, but pangolin still crashes if growing over 1GB in RAM usage. Setting a 900MB limit in docker will restart the Container like every 2 hours but it's usable.
@Alloc86 commented on GitHub (Feb 27, 2026):
Just to chime in as I have a different scale of proxy on my end:
I don't get reproducible issues, but had it like 3-5 times locking up due to memory issues. The current "session" has been fine for like 3-4 weeks already though (with no change either), maybe less bots hitting it or something.
@oschwartz10612 commented on GitHub (Feb 27, 2026):
Thanks everyone for the continued information and concerns.
We are looking at it but it has been hard to pin down what it is with
all of the reports in here I am not sure if there has been a "smoaking
gun" I can just go fix. On top of that - despite doing updates to
packages from dependabot that did not fix it either if it is a
dependency thing.
I will try to make it a point to look into this again with the new info
ASAP and maybe we can do a patch or two or something.
What is even more baffling is we have thousands of users and sites on
the cloud yet we dont see the issue LOL so all I can say is we are still
throughly confused but want to get this resolved!
I would highly suggest adding resource limits to the container though -
docker should handle killing it and restarting it
https://docs.docker.com/reference/compose-file/deploy/#resources
@formless63 commented on GitHub (Feb 27, 2026):
If there is anything specific those of us who are affected can do to help, please let us know. I'm happy to set up custom logging of some sort if there are configurations that might bring more details to light for you to work with. - or any other potential item that might produce good data for you.
Thanks for all of the work you do!
@Joly0 commented on GitHub (Feb 27, 2026):
Btw, I was affected by this problem aswell a while ago. I had added a lot of things to my pangolin stack (like the traefik-dashboard or other things by hhftechnology). I tried resetting and re-installing pangolin and only added crowdsec and the geoblock updater containers to the stack, and so far everything is buttery smooth and stable
@Ragnaruk commented on GitHub (Mar 2, 2026):
Could it be the sqlite driver? You probably use Postgres in your cloud.
@n1LWeb commented on GitHub (Mar 2, 2026):
I'm using sqlite and have the issue.
Others?
@joerg-hro commented on GitHub (Mar 2, 2026):
me too
@Josh-Voyles commented on GitHub (Mar 2, 2026):
Sqlite here.
@oschwartz10612 commented on GitHub (Mar 4, 2026):
Ahh yes this is good info. It must be with the sqlite driver or
something else then. Helps narrow it down! Let me do some thinking.
Maybe its time to upgrade to libsqlite3 to get off better-sqlite...
@hansencheck24 commented on GitHub (Mar 4, 2026):
Im using ghcr.io/fosrl/pangolin:postgresql-1.15.1 and has issue
@maiestro commented on GitHub (Mar 9, 2026):
In the meantime, I have installed Pangolin on two different small VPSs, each with 1GB RAM, 1-core vCPU, and the latest Debian 13 for testing purposes:
On the IONOS system, a failure occurs almost immediately (even the eth0 interface fails after a while).
Based on my testing, the failure occurs shortly after I visit the Pangolin configuration web interface to make some settings.
On the Netcup system, Pangolin runs with almost 80% RAM utilization, but has been stable so far (3 days).
root@IONOS-VPS# lscpuroot@NETCUP-VPS# lscpuPerhaps other systems with Pangolin problems are looking similar to my IONOS VPS System?
@n1LWeb commented on GitHub (Mar 9, 2026):
For me the issue exists on a RackNerd VPS and on an Oracle Free Tier ARM VPS.
On both only if I have activated health checks.
Racknerd:
@0i5e4u commented on GitHub (Mar 9, 2026):
Same here with Problems:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 40 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Vendor ID: AuthenticAMD
BIOS Vendor ID: QEMU
Model name: AMD EPYC-Milan Processor
BIOS Model name: pc-i440fx-6.1 CPU @ 2.0GHz
BIOS CPU family: 1
CPU family: 25
Model: 1
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
Stepping: 1
BogoMIPS: 3992.49
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_g
ood nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf
_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 i
nvpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat npt nrip_save umip pku ospke vaes vpclmulq
dq rdpid
Virtualization features:
Virtualization: AMD-V
Hypervisor vendor: KVM
Virtualization type: full
Hosted on Strato
@harrybaumann commented on GitHub (Mar 10, 2026):
I have the issue on the same 1 GB IONOS VPS as mentioned above. Sqlite version of pangolin. Using 3 Newt connections. There is not much load, as there are only a hand-full of users.
However, the pangolin instance stops responding after some hours. Sometimes it stays responsible for 1 or 2 days, but not more. It is necessary to restart the instance via IONOS management console.
@joerg-hro commented on GitHub (Mar 10, 2026):
I' ve the same Konfiguration with the same issues.
@xylcro commented on GitHub (Mar 10, 2026):
Same issue here. Also using sqlite
@Madnex commented on GitHub (Mar 13, 2026):
Same issue here as well. I started to use gatus for monitoring and had used a config that hit the pangolin auth page for the checks. Memory consumsuption of the pangolin container went up steadily then until I fixed the gatus config. Then still had to redeploy pangolin to free up the memory. Using pangolin v1.16.2
@xylcro commented on GitHub (Mar 16, 2026):
I use Gatus too, how'd you fix your config?
@sambilbow commented on GitHub (Mar 16, 2026):
I also use Gatus to hit my proxied endpoints... Interesting
@Madnex commented on GitHub (Mar 16, 2026):
I just added IP based bypass rules in pangolin. Was a bit tricky to find out when the check is actually working and not hitting the authentication page. One thing that helped was adding this condition:
"[BODY] != pat(*Powered by Pangolin*)"However, even after fixing this, I see that the pangolin container is steadily grabbing more RAM, just much slower now. I assume everytime someone tries to access endpoints and hits the authentication page it adds up. At least it's not my own monitoring anymore...
@dunamos commented on GitHub (Mar 16, 2026):
Hi!
I have the same issue, and I only noticed it after adding a monitoring tool (Dockhand in my case).
I am not sure which of the chicken or the egg. Did adding the monitoring caused OOM issues (I since added a max RAM allowed) or was it doing it before, and I didn't notice?
@formless63 commented on GitHub (Mar 16, 2026):
Also using Gatus to do health checks here, but I do have bypass rules in place.
It certainly seems related to authorization hits. When I setup f2b and started blocking bots and such the memory growth slowed but has not completely resolved.
@harrybaumann commented on GitHub (Mar 16, 2026):
I think I have found "my" issue with a 1 GB VPS. Although the RAM filled up quite fast, it wasn't the reason for the instance to become unavailable. I noticed that the instance's harddisk of 10 GB was completely full (100%), when the machine stopped working.
Due to a sqllite database that had more and more data in it (multiple GB, maybe logging?) and a docker image that became larger with every version update, the lifetime of my small instance got shorter.
I've "fixed" the issue by upgrading to a bigger instance with more harddisk space. I can confirm that pangolin works well for 5 days now on this new machine running a copy of the original docker volume, so I believe, the crashes are gone.
Maybe the issue I had wasn't the issue discussed in this topic.
@dunamos commented on GitHub (Mar 16, 2026):
In my case I had my VPS become irresponsive due to kswapd0 using 100% of my CPU.
It makes sense because my RAM was saturated, and my storage space was very low. So no swap available.
After cleaning my VPS storage a bit and adding a 500M RAM limit to my Pangolin container I have fewer issues.
@xylcro commented on GitHub (Mar 16, 2026):
After removing the proxy host from Gatus monitoring, the ram appears to remain stable. It definitely has something to do with Gatus and it hitting the auth page...
@n1LWeb commented on GitHub (Mar 16, 2026):
Not specific to Gatus but any repeated accesses i think. But thanks for bringing Gatus to my attention might be time to switch. Looks like a better fit for my workflow than Uptime Kuma.
I however don't check much of my pangolin services using Uptime Kuma, instead I'm using the health checks of Pangolin itself and it makes a huge difference if i disable them, sadly I need them for failover.
The other thing that was new around the time the problems started is the Country filtering. On the 24GB RAM VPS I'm using pangolin becomes irresponsive as soon as it hits around 1GB RAM way before my memory is full.
@cmmrandau commented on GitHub (Mar 19, 2026):
It did it again. This is on a vanilla setup with crowdsec. Only other services running are arcane agent and watchtower (and pulse agent as a systemd service).
@TubaApollo commented on GitHub (Mar 22, 2026):
I am also experiencing an issue with some kind of memory leak, so I tried to trace it down.
I took two V8 heap snapshots from the running server process (
ee-1.16.2, Nodev24.14.0), the second after forcing GC viaHeapProfiler.collectGarbage:GzipServerResponsezlib_memory(native, ~263 KB each)All leaked sockets trace back to the Next.js server.
The bulk of the leaked memory is native zlib allocations:
Container RSS after 1.5h: 1.54 GiB
V8 heap: 253 MB
Untracked native (zlib): ~1.2 GB
The compression is applied in
node_modules/next/dist/server/lib/router-server.js:Disable compression in
node_modules/next/dist/server/lib/router-server.jsline 109 helped in my case:A proper fix would likely be setting compress: false in Next.js's config, but I have only verified the direct patch above.
And so far I am seeing a massive improvement, maybe someone can confirm that. Not sure if it's related.
@huzky-v commented on GitHub (Mar 22, 2026):
That's also what I observed on my loadtesting on the heap, and tried that disable compression config before
But the config is not respected and gzip response is still there.
@TubaApollo commented on GitHub (Mar 22, 2026):
I tried with the
compress: falseoption passed tonext()first. But it is ignored becauserouter-server.jsreads from the Next.js file config (next.config.js), not the constructor options. The only way I got it to work was patchingrouter-server.jsdirectly.@huzky-v commented on GitHub (Mar 22, 2026):
My approach was to set the https://github.com/fosrl/pangolin/blob/main/next.config.ts with compression option disabled but it did not work.
Don't know if I set the config wrong
@TubaApollo commented on GitHub (Mar 22, 2026):
I rechecked. The config seems to be baked into
/app/.next/required-server-files.json. You would need a full rebuild if you don't want to patch it. (turns out this is wrong, this file is not read at runtime I think?)@huzky-v commented on GitHub (Mar 22, 2026):
My test always rebuild the docker images for each test after making changes (including the next config), but it doesn't work.
Maybe I try to build that image again and check the files
@huzky-v commented on GitHub (Mar 22, 2026):
The response still have
gzip@TubaApollo commented on GitHub (Mar 22, 2026):
I am not fully sure but although the config is baked into
required-server-files.json,router-server.jsnever reads it? It callsloadConfig()which looks for a physicalnext.config.jsfile on disk and that file doesn't exist in the container. So it defaults tocompresstrue.So you will probably need to adjust the Dockerfile accordingly if you havent already.
By adding something like this:
COPY --from=builder-dev /app/next.config.ts ./next.config.tsBut I rather would have someone with a bit more clue confirm this haha
@huzky-v commented on GitHub (Mar 22, 2026):
OK, with the idea from @TubaApollo , finally managed to get the compress option in nextjs turned off
The idea is to add a file
next.override.tsEDIT: The image also does not needed to be built
For existing pangolin user, just pass the
next.override.tsfile to pangolin volume in the compose file tohot patchit without rebuild the imagePlease note that as the gz is disabled in
nextjs, you may see massive jump on bandwidth usage if you don't change the traefik config.In compensate of the gz is turning off from the next.js, offload the gz compression to traefik
Add a middleware on
config/traefik/dynamic_config.ymlDon't know if there is negative effect on the gz middleware tho.
My testing shows that there is no more zlib stuff in the HeapDump, but can't tell with my instance as my instance is not resource limited.
@huzky-v commented on GitHub (Mar 23, 2026):
Moved my pangolin instance to a 1G ram VPS, see what will happen in couple days.
Current state is like this
pangolin
(CPU): 5.45%
(MEM USAGE / LIMIT) 432.6MiB / 954.9MiB
(BLOCK I/O) 34.6GB / 3.73MB
EDIT: The instance is crashed, still not working 😩
@Josh-Voyles commented on GitHub (Mar 23, 2026):
@huzky-v It didn't seem to make a difference for me. However, I'm not running the enterprise build.
I'm going to try disabling my uptime kuma checks. It's been brutal these last few weeks.
@TubaApollo commented on GitHub (Mar 25, 2026):
Mh, there might be another (possibly smaller memory leak?). Because for me it's definitely a lot better. I have 16GB memory available and before it took all of them within a few days. Now I am at about 1,4GB since the fix, so it did definitely do something.
@hansencheck24 commented on GitHub (Apr 10, 2026):
fix-memory-leak.patch
does anyone can help me how can I test to build custom pangolin image, so I can test this patch? I would like to build custom postgresql variant of the image