Serv command consumes all server resources #2059

Closed
opened 2025-11-02 04:22:38 -06:00 by GiteaMirror · 53 comments
Owner

Originally created by @xor-gate on GitHub (Jul 16, 2018).

Description

We use jenkins to poll for changes in git. It uses openssh for access and gitea spawns a serv command per poll. There are many concurrent connections (jenkins set to 10 concurrent SCM pollers) and the server memory and CPU is blown up. It doesn't happen very often but I'm not sure its a deadlock/racecondition in Gitea. We are going to upgrade to 1.4.3 but according to the changelog it should probably not fix the problem for us.

If you need some more information we hopefully could provide it as it only happens once a day or so.

Screenshots

knipsel

Originally created by @xor-gate on GitHub (Jul 16, 2018). - Gitea version (or commit ref): 1.4.3 - Git version: git version 2.1.4 - Operating system: Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.51-3 (2017-12-13) x86_64 GNU/Linux - Database (use `[x]`): - [x] MySQL (mariadb 10.1) - Can you reproduce the bug at https://try.gitea.io: - [x] Not relevant - Log gist: https://gist.github.com/xor-gate/baaee46f30a04c7f03ac5dd6c64ceeee ## Description We use jenkins to poll for changes in git. It uses openssh for access and gitea spawns a serv command per poll. There are many concurrent connections (jenkins set to 10 concurrent SCM pollers) and the server memory and CPU is blown up. It doesn't happen very often but I'm not sure its a deadlock/racecondition in Gitea. We are going to upgrade to 1.4.3 but according to the changelog it should probably not fix the problem for us. If you need some more information we hopefully could provide it as it only happens once a day or so. ## Screenshots ![knipsel](https://user-images.githubusercontent.com/1050166/42748883-ff6f5bf4-88e1-11e8-8761-8c06074adf67.PNG)
GiteaMirror added the type/miscellaneousissue/critical labels 2025-11-02 04:22:38 -06:00
Author
Owner

@xor-gate commented on GitHub (Jul 16, 2018):

We have upgraded to v1.4.3. If it happens again we try to come up with some more details.

@xor-gate commented on GitHub (Jul 16, 2018): We have upgraded to v1.4.3. If it happens again we try to come up with some more details.
Author
Owner

@xor-gate commented on GitHub (Jul 17, 2018):

Still seems to be an issues multiple times a day with v1.4.3.

@xor-gate commented on GitHub (Jul 17, 2018): Still seems to be an issues multiple times a day with v1.4.3.
Author
Owner

@xor-gate commented on GitHub (Jul 17, 2018):

I think it is because we have only assigned 16GB memory and some swap. Every connection uses virtual ~1500M memory and ~500M residence memory. This is a insane amount of memory.

@xor-gate commented on GitHub (Jul 17, 2018): I think it is because we have only assigned 16GB memory and some swap. Every connection uses virtual ~1500M memory and ~500M residence memory. This is a insane amount of memory.
Author
Owner

@lafriks commented on GitHub (Jul 17, 2018):

Is there anything useful in gitea.log at such times?

@lafriks commented on GitHub (Jul 17, 2018): Is there anything useful in gitea.log at such times?
Author
Owner

@xor-gate commented on GitHub (Jul 18, 2018):

I have had a look at the logfiles but nothing special is in there. I will keep an eye on it. But what about dumping debug pprof to disk for the serv command to see the actual memory usage hotspots?

@xor-gate commented on GitHub (Jul 18, 2018): I have had a look at the logfiles but nothing special is in there. I will keep an eye on it. But what about dumping debug [pprof](https://jvns.ca/blog/2017/09/24/profiling-go-with-pprof/) to disk for the serv command to see the actual memory usage hotspots?
Author
Owner

@xor-gate commented on GitHub (Jul 27, 2018):

Hi guys, I have no exact explanation but the problem has not be seen after we updated Jenkins. Still i think it is good to have the ability create pprof dumps per new process of serv command to figure out the memory usage. Because OOM-killer (linux) can come by and reap other processes and make the server unusable. In the cases where SSH is requested with a high-rate (big deployments).

@xor-gate commented on GitHub (Jul 27, 2018): Hi guys, I have no exact explanation but the problem has not be seen after we updated Jenkins. Still i think it is good to have the ability create pprof dumps per new process of serv command to figure out the memory usage. Because OOM-killer (linux) can come by and reap other processes and make the server unusable. In the cases where SSH is requested with a high-rate (big deployments).
Author
Owner

@lafriks commented on GitHub (Jul 27, 2018):

Currently it is hard to find out cause for this without ability to reproduce such behaviour

@lafriks commented on GitHub (Jul 27, 2018): Currently it is hard to find out cause for this without ability to reproduce such behaviour
Author
Owner

@xor-gate commented on GitHub (Jul 28, 2018):

I totaly understand, is it a possibility I add a feature for pprof memory dumps to disk on exit with special flag? Because currently there is only a http pprof endpoint which can not be used with short lived serve executions under SSH. Then we can profile CPU and Heap and make some better conclusions than this weird "smoking server" problem.

https://github.com/go-gitea/gitea/search?q=pprof&unscoped_q=pprof

@xor-gate commented on GitHub (Jul 28, 2018): I totaly understand, is it a possibility I add a feature for pprof memory dumps to disk on exit with special flag? Because currently there is only a http pprof endpoint which can not be used with short lived serve executions under SSH. Then we can profile CPU and Heap and make some better conclusions than this weird "smoking server" problem. https://github.com/go-gitea/gitea/search?q=pprof&unscoped_q=pprof
Author
Owner

@lafriks commented on GitHub (Jul 28, 2018):

Sure, it could help to have such option to pprof this

@lafriks commented on GitHub (Jul 28, 2018): Sure, it could help to have such option to pprof this
Author
Owner

@xor-gate commented on GitHub (Jul 30, 2018):

If no one will work on this I can try to implement dumping pprof to disk in the upcoming weeks?

@xor-gate commented on GitHub (Jul 30, 2018): If no one will work on this I can try to implement dumping pprof to disk in the upcoming weeks?
Author
Owner

@lafriks commented on GitHub (Jul 30, 2018):

Yes, go ahead

@lafriks commented on GitHub (Jul 30, 2018): Yes, go ahead
Author
Owner

@xor-gate commented on GitHub (Jul 30, 2018):

@lafriks initial working dumping is in pr #4560

@xor-gate commented on GitHub (Jul 30, 2018): @lafriks initial working dumping is in pr #4560
Author
Owner

@micw commented on GitHub (Aug 1, 2018):

I have the same issue - from time to time, hundreds of "gitea serv" processes eat all my system memory. How can I help to track it down?

@micw commented on GitHub (Aug 1, 2018): I have the same issue - from time to time, hundreds of "gitea serv" processes eat all my system memory. How can I help to track it down?
Author
Owner

@xor-gate commented on GitHub (Aug 1, 2018):

Hi @micw I started working on profiling this problem in feature #4560. Currently we can not trace exactly what is going on so I added pprof dumps to disk. More can be read at https://blog.golang.org/profiling-go-programs. I am using the https://github.com/google/pprof web interface visualizer.

@xor-gate commented on GitHub (Aug 1, 2018): Hi @micw I started working on profiling this problem in feature #4560. Currently we can not trace exactly what is going on so I added pprof dumps to disk. More can be read at https://blog.golang.org/profiling-go-programs. I am using the https://github.com/google/pprof web interface visualizer.
Author
Owner

@xor-gate commented on GitHub (Aug 1, 2018):

When #4560 is merged (and released) we could share and offline analyze the pprof files and investigate further.

@xor-gate commented on GitHub (Aug 1, 2018): When #4560 is merged (and released) we could share and offline analyze the pprof files and investigate further.
Author
Owner

@chihchi commented on GitHub (Aug 10, 2018):

Hi all, I use gitea 1.3.2 have the same issue almost used CPU resource on AWS EC2 m4.medium,
Suggests me upgrade to 1.4.3?

platform 1195 0.4 2.0 972704 77268 ? Ssl Jul24 103:26 /home/platform/gitea/gitea web
platform 10094 23.6 13.5 707236 523232 ? Rsl 13:58 0:02 /home/platform/gitea/gitea serv key-64 --config=/home/platform/gitea/custom/conf/app.ini
platform 10135 23.5 20.3 1209440 782640 ? Rsl 13:58 0:01 /home/platform/gitea/gitea serv key-64 --config=/home/platform/gitea/custom/conf/app.ini
platform 10176 26.7 13.6 707236 524916 ? Ssl 13:58 0:01 /home/platform/gitea/gitea serv key-64 --config=/home/platform/gitea/custom/conf/app.ini
platform 10217 20.5 13.5 641700 522992 ? Ssl 13:58 0:00 /home/platform/gitea/gitea serv key-64 --config=/home/platform/gitea/custom/conf/app.ini

2018-08-08 14 57 47

@chihchi commented on GitHub (Aug 10, 2018): Hi all, I use gitea 1.3.2 have the same issue almost used CPU resource on AWS EC2 m4.medium, Suggests me upgrade to 1.4.3? `platform 1195 0.4 2.0 972704 77268 ? Ssl Jul24 103:26 /home/platform/gitea/gitea web` `platform 10094 23.6 13.5 707236 523232 ? Rsl 13:58 0:02 /home/platform/gitea/gitea serv key-64 --config=/home/platform/gitea/custom/conf/app.ini` `platform 10135 23.5 20.3 1209440 782640 ? Rsl 13:58 0:01 /home/platform/gitea/gitea serv key-64 --config=/home/platform/gitea/custom/conf/app.ini` `platform 10176 26.7 13.6 707236 524916 ? Ssl 13:58 0:01 /home/platform/gitea/gitea serv key-64 --config=/home/platform/gitea/custom/conf/app.ini` `platform 10217 20.5 13.5 641700 522992 ? Ssl 13:58 0:00 /home/platform/gitea/gitea serv key-64 --config=/home/platform/gitea/custom/conf/app.ini` ![2018-08-08 14 57 47](https://user-images.githubusercontent.com/2201233/43941088-5705dd8a-9ca5-11e8-8199-3c3107f362f4.png)
Author
Owner

@micw commented on GitHub (Aug 10, 2018):

@chihchi This issue causes high memory/io consumption and occurs on 1.4.3. So I'm not sure that your's is the same issue. If so, upgrading to 1.4.3 won't help.

@micw commented on GitHub (Aug 10, 2018): @chihchi This issue causes high memory/io consumption and occurs on 1.4.3. So I'm not sure that your's is the same issue. If so, upgrading to 1.4.3 won't help.
Author
Owner

@micw commented on GitHub (Aug 10, 2018):

The issues occured several times this week. It looks like the situation occurs if more "serv" processes are running than the server can handle. If no more requests follow, the server will return to normal after a while. But if clients keep querying git (like jenkins does), more and more processes are launched which consume all available ressources (mainly memory).

Basically it's expected behaviour if clients consume more ressources than the server can provide (kind of DoS). But in this case, the amount of client activity is quite small (just a few jenkins jobs polling for changes every minute) - that should be handled without problems.

As a workaround, I moved our git to a better server (fast SSD Storage, 64 GB Ram, 12 CPU cores) - the issue did not occur again since that.

@micw commented on GitHub (Aug 10, 2018): The issues occured several times this week. It looks like the situation occurs if more "serv" processes are running than the server can handle. If no more requests follow, the server will return to normal after a while. But if clients keep querying git (like jenkins does), more and more processes are launched which consume all available ressources (mainly memory). Basically it's expected behaviour if clients consume more ressources than the server can provide (kind of DoS). But in this case, the amount of client activity is quite small (just a few jenkins jobs polling for changes every minute) - that should be handled without problems. As a workaround, I moved our git to a better server (fast SSD Storage, 64 GB Ram, 12 CPU cores) - the issue did not occur again since that.
Author
Owner

@lafriks commented on GitHub (Aug 10, 2018):

@chihchi you should try upgrading anyway as even if there has no been direct change to fix that but could happen that by fixing other bug this could also could have been fixed

@lafriks commented on GitHub (Aug 10, 2018): @chihchi you should try upgrading anyway as even if there has no been direct change to fix that but could happen that by fixing other bug this could also could have been fixed
Author
Owner

@micw commented on GitHub (Aug 10, 2018):

I need to update my experience - the issue also occurs on the bigger server today - it can handle the load better but already has a load of >60. Since the system is still responsive, I can do some analysis:

ps fax| grep "gitea serv" | wc -l

-> only 40 gitea serv process are running

ps fax| grep "gitea serv" | awk '{ print $1 }' | sort -n > list1
(wait a while)
ps fax| grep "gitea serv" | awk '{ print $1 }' | sort -n > list2
comm -12 list1 list2

only a few processes (~15) last longer and consume much ressources (some consume 1 GB or more memory). After a while (~10 minutes) all those processes are finished and all returned to normal.

@micw commented on GitHub (Aug 10, 2018): I need to update my experience - the issue also occurs on the bigger server today - it can handle the load better but already has a load of >60. Since the system is still responsive, I can do some analysis: ps fax| grep "gitea serv" | wc -l -> only 40 gitea serv process are running ps fax| grep "gitea serv" | awk '{ print $1 }' | sort -n > list1 (wait a while) ps fax| grep "gitea serv" | awk '{ print $1 }' | sort -n > list2 comm -12 list1 list2 only a few processes (~15) last longer and consume much ressources (some consume 1 GB or more memory). After a while (~10 minutes) all those processes are finished and all returned to normal.
Author
Owner

@xor-gate commented on GitHub (Aug 10, 2018):

@lafriks is it possible my pprof dump feature could be backported to 1.4.x ?

@xor-gate commented on GitHub (Aug 10, 2018): @lafriks is it possible my pprof dump feature could be backported to 1.4.x ?
Author
Owner

@lafriks commented on GitHub (Aug 11, 2018):

@xor-gate not for 1.4.x as 1.5.0 was just released. We don't usually backport features to minor releases

@lafriks commented on GitHub (Aug 11, 2018): @xor-gate not for 1.4.x as 1.5.0 was just released. We don't usually backport features to minor releases
Author
Owner

@xor-gate commented on GitHub (Aug 13, 2018):

I understand

@xor-gate commented on GitHub (Aug 13, 2018): I understand
Author
Owner

@chihchi commented on GitHub (Aug 15, 2018):

@lafriks Hi all, I upgraded to 1.4.3, it's seem to resolve high CPU loading problem,
I will monitor it continue.

2018-08-15 13 48 18

@chihchi commented on GitHub (Aug 15, 2018): @lafriks Hi all, I upgraded to 1.4.3, it's seem to resolve high CPU loading problem, I will monitor it continue. ![2018-08-15 13 48 18](https://user-images.githubusercontent.com/2201233/44133426-50df3992-a092-11e8-999a-4742e1637129.png)
Author
Owner

@micw commented on GitHub (Sep 3, 2018):

Just want to point out that @chihchi's issue was a different one (completely different, much older version). The issue described in this ticket still exists.

@micw commented on GitHub (Sep 3, 2018): Just want to point out that @chihchi's issue was a different one (completely different, much older version). The issue described in this ticket still exists.
Author
Owner

@xor-gate commented on GitHub (Sep 3, 2018):

@micw if you update to 1.5.x it is possible to enable profiling

@xor-gate commented on GitHub (Sep 3, 2018): @micw if you update to 1.5.x it is possible to enable profiling
Author
Owner

@micw commented on GitHub (Sep 3, 2018):

Upgrade is scheduled to wednesday. But I have no Idea, how to enable profiling and how to profile. So please let me know how I can provide information to you.

@micw commented on GitHub (Sep 3, 2018): Upgrade is scheduled to wednesday. But I have no Idea, how to enable profiling and how to profile. So please let me know how I can provide information to you.
Author
Owner

@xor-gate commented on GitHub (Sep 3, 2018):

You should enable pprof (application wide) with this options: https://github.com/go-gitea/gitea/blob/master/custom/conf/app.ini.sample#L194-L199. It then produces files which can be loaded with https://github.com/google/pprof running with -gui it fires up a webpage where you can view memory or cpu profiles.

@xor-gate commented on GitHub (Sep 3, 2018): You should enable pprof (application wide) with this options: https://github.com/go-gitea/gitea/blob/master/custom/conf/app.ini.sample#L194-L199. It then produces files which can be loaded with https://github.com/google/pprof running with `-gui` it fires up a webpage where you can view memory or cpu profiles.
Author
Owner

@micw commented on GitHub (Sep 5, 2018):

Hi. I upgraded to 1.5.1 (docker image), added to app.ini server section:

ENABLE_PPROF     = true
PPROF_DATA_PATH  = /data/gitea/pprof

Created the directory, changed mode to 0777 and restarted the service - there are no profiling dumps and web is not listening at :6060.

@micw commented on GitHub (Sep 5, 2018): Hi. I upgraded to 1.5.1 (docker image), added to app.ini server section: ENABLE_PPROF = true PPROF_DATA_PATH = /data/gitea/pprof Created the directory, changed mode to 0777 and restarted the service - there are no profiling dumps and web is not listening at :6060.
Author
Owner

@xor-gate commented on GitHub (Sep 5, 2018):

Hi @micw, i had a look and it seems 1.5.1 doesn't have pprof to disk support yet. Probably it will be in the 1.5.2 release.

@xor-gate commented on GitHub (Sep 5, 2018): Hi @micw, i had a look and it seems 1.5.1 doesn't have pprof to disk support yet. Probably it will be in the 1.5.2 release.
Author
Owner

@micw commented on GitHub (Sep 5, 2018):

Looking forward. I had a deeper look into the resource usage. It idles with 5-10% cpu and 100mb of memory. When our jenkins runs some polling, load goes up to 800% (saturates all 8 cpu cores) and memory usage to >20 GB.

@micw commented on GitHub (Sep 5, 2018): Looking forward. I had a deeper look into the resource usage. It idles with 5-10% cpu and 100mb of memory. When our jenkins runs some polling, load goes up to 800% (saturates all 8 cpu cores) and memory usage to >20 GB.
Author
Owner

@lunny commented on GitHub (Sep 5, 2018):

@xor-gate @micw that will be in v1.6.0. See #4560

@lunny commented on GitHub (Sep 5, 2018): @xor-gate @micw that will be in v1.6.0. See #4560
Author
Owner

@micw commented on GitHub (Nov 2, 2018):

I upgraded to 1.5.3, the problem still exists and gets worse. I have a bunch of git repos and a few (~30) jenkins jobs polling on it. This causes a fat server (10 cpu cores, 60gb of ram, fully ssd) to overload.

Here's how top looks like after a few minutes, almost all of the 60 gb of memory are consumed:

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                   
   89 root      20   0       0      0      0 S  72.8  0.0   1:48.55 kswapd0                                                                                                                   
 6458 root      20   0   83616  29252      0 D  54.6  0.1   3:31.26 traefik                                                                                                                   
12528 1000      20   0 2805424 1.784g      0 D  53.4  3.6   0:24.38 gitea                                                                                                                     
 8651 1000      20   0 3724112 2.653g      0 S  53.1  5.4   1:57.13 gitea                                                                                                                     
10997 1000      20   0 3725004 1.982g      0 S  51.0  4.0   0:22.77 gitea                                                                                                                     
12036 1000      20   0 3725196 2.620g      0 R  38.5  5.3   0:24.64 gitea                                                                                                                     
12608 1000      20   0 3723700 2.035g      0 R  36.3  4.1   0:27.05 gitea                                                                                                                     
10209 1000      20   0 2805488 1.736g      0 R  33.9  3.5   0:35.03 gitea                                                                                                                     
 5442 root      20   0   35008  21812      0 S  32.5  0.0   1:25.07 rancher-metadat                                                                                                           
11904 1000      20   0 3723884 2.243g      0 S  32.0  4.6   0:11.30 gitea                                                                                                                     
 9649 1000      20   0 3723792 2.660g      0 S  27.6  5.4   1:27.40 gitea                                                                                                                     
11315 1000      20   0 2804820 1.122g      0 R  27.6  2.3   0:25.93 gitea                                                                                                                     
12274 1000      20   0 2805260 1.628g      0 R  27.2  3.3   0:26.52 gitea                                                                                                                     
11895 1000      20   0 2805352 1.733g      0 R  26.2  3.5   0:12.60 gitea                                                                                                                     
 6224 systemd+  20   0  137772  23580      0 S  24.0  0.0   0:35.00 consul                                                                                                                    
 5912 root      20   0  916660  16956      0 S  22.8  0.0   1:00.51 plugin-manager                                                                                                            
  694 root      20   0 2432388  55960      0 S  22.4  0.1   3:28.63 dockerd                                                                                                                   
12016 1000      20   0 3723856 2.660g      0 S  22.1  5.4   0:11.34 gitea                                                                                                                     
 3481 root      20   0  771312  16132      0 S  21.2  0.0   0:49.00 agent                                                                                                                     
13409 1000      20   0  968008 472448      0 R  20.0  0.9   0:02.36 gitea                                                                                                                     
13574 1000      20   0  967916 173440      0 D  18.3  0.3   0:00.76 gitea                                                                                                                     
13056 1000      20   0  968008 458040      0 R  18.0  0.9   0:02.80 gitea                                                                                                                     
 9711 1000      20   0 3723508 1.797g      0 S  16.8  3.7   1:14.19 gitea                                                                                                                     
13140 1000      20   0  969412 462064      0 R  16.8  0.9   0:01.87 gitea                                                                                                                     
13134 1000      20   0  969596 382320      0 D  16.1  0.7   0:02.61 gitea                                                                                                                     
13404 1000      20   0  967980 118044      0 D  15.1  0.2   0:02.01 gitea                                                                                                                     
12467 1000      20   0  968008 923588      0 R  14.9  1.8   0:04.36 gitea                                                                                                                     
12442 1000      20   0  968356 923724      0 R  14.7  1.8   0:04.68 gitea                                                                                                                     
 8664 1000      20   0 2228884 2.083g      0 S  13.9  4.2   1:51.24 gitea                                                                                                                     
13509 1000      20   0   49484   5120      0 R  13.0  0.0   0:01.05 gitea                                                                                                                     
12371 1000      20   0  968356 615704      0 S  12.0  1.2   0:04.23 gitea                                                                                                                     
13138 1000      20   0  968100 447456      0 R  12.0  0.9   0:02.70 gitea                                                                                                                     
 9907 1000      20   0 3723856 2.262g      0 S  11.8  4.6   1:58.84 gitea                                                                                                                     
12661 1000      20   0  967752 916976      0 R  11.3  1.8   0:03.79 gitea                                                                                                                     
13359 1000      20   0  968100 415048      0 R  11.3  0.8   0:02.02 gitea                                                                                                                     
13581 1000      20   0   49576   5160      0 R  11.3  0.0   0:00.47 gitea                                                                                                                     
12891 1000      20   0  968192 923508      0 R  11.1  1.8   0:03.08 gitea                                                                                                                     
 6556 root      20   0  890824  16828      0 S  10.8  0.0   0:07.86 dockerd                                                                                                                   
13496 1000      20   0  967908  99616      0 R  10.1  0.2   0:01.62 gitea                                                                                                                     
12941 1000      20   0 1886768 924720    856 R   9.9  1.8   0:02.93 gitea                                                                                                                     
13124 1000      20   0  968284 476140      0 R   9.9  0.9   0:02.41 gitea                                                                                                                     
12636 1000      20   0  968540 923748      0 S   9.1  1.8   0:03.26 gitea                                                                                                                     
13057 1000      20   0  969596 383820      0 R   8.9  0.7   0:02.65 gitea                                                                                                                     
12529 1000      20   0  968448 923612      0 R   8.7  1.8   0:22.79 gitea                                                                                                                     
13524 1000      20   0  966516  33820      0 D   8.7  0.1   0:01.04 gitea                                                                                                                     

This happens serveral times a day.

@micw commented on GitHub (Nov 2, 2018): I upgraded to 1.5.3, the problem still exists and gets worse. I have a bunch of git repos and a few (~30) jenkins jobs polling on it. This causes a fat server (10 cpu cores, 60gb of ram, fully ssd) to overload. Here's how top looks like after a few minutes, almost all of the 60 gb of memory are consumed: ``` PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 89 root 20 0 0 0 0 S 72.8 0.0 1:48.55 kswapd0 6458 root 20 0 83616 29252 0 D 54.6 0.1 3:31.26 traefik 12528 1000 20 0 2805424 1.784g 0 D 53.4 3.6 0:24.38 gitea 8651 1000 20 0 3724112 2.653g 0 S 53.1 5.4 1:57.13 gitea 10997 1000 20 0 3725004 1.982g 0 S 51.0 4.0 0:22.77 gitea 12036 1000 20 0 3725196 2.620g 0 R 38.5 5.3 0:24.64 gitea 12608 1000 20 0 3723700 2.035g 0 R 36.3 4.1 0:27.05 gitea 10209 1000 20 0 2805488 1.736g 0 R 33.9 3.5 0:35.03 gitea 5442 root 20 0 35008 21812 0 S 32.5 0.0 1:25.07 rancher-metadat 11904 1000 20 0 3723884 2.243g 0 S 32.0 4.6 0:11.30 gitea 9649 1000 20 0 3723792 2.660g 0 S 27.6 5.4 1:27.40 gitea 11315 1000 20 0 2804820 1.122g 0 R 27.6 2.3 0:25.93 gitea 12274 1000 20 0 2805260 1.628g 0 R 27.2 3.3 0:26.52 gitea 11895 1000 20 0 2805352 1.733g 0 R 26.2 3.5 0:12.60 gitea 6224 systemd+ 20 0 137772 23580 0 S 24.0 0.0 0:35.00 consul 5912 root 20 0 916660 16956 0 S 22.8 0.0 1:00.51 plugin-manager 694 root 20 0 2432388 55960 0 S 22.4 0.1 3:28.63 dockerd 12016 1000 20 0 3723856 2.660g 0 S 22.1 5.4 0:11.34 gitea 3481 root 20 0 771312 16132 0 S 21.2 0.0 0:49.00 agent 13409 1000 20 0 968008 472448 0 R 20.0 0.9 0:02.36 gitea 13574 1000 20 0 967916 173440 0 D 18.3 0.3 0:00.76 gitea 13056 1000 20 0 968008 458040 0 R 18.0 0.9 0:02.80 gitea 9711 1000 20 0 3723508 1.797g 0 S 16.8 3.7 1:14.19 gitea 13140 1000 20 0 969412 462064 0 R 16.8 0.9 0:01.87 gitea 13134 1000 20 0 969596 382320 0 D 16.1 0.7 0:02.61 gitea 13404 1000 20 0 967980 118044 0 D 15.1 0.2 0:02.01 gitea 12467 1000 20 0 968008 923588 0 R 14.9 1.8 0:04.36 gitea 12442 1000 20 0 968356 923724 0 R 14.7 1.8 0:04.68 gitea 8664 1000 20 0 2228884 2.083g 0 S 13.9 4.2 1:51.24 gitea 13509 1000 20 0 49484 5120 0 R 13.0 0.0 0:01.05 gitea 12371 1000 20 0 968356 615704 0 S 12.0 1.2 0:04.23 gitea 13138 1000 20 0 968100 447456 0 R 12.0 0.9 0:02.70 gitea 9907 1000 20 0 3723856 2.262g 0 S 11.8 4.6 1:58.84 gitea 12661 1000 20 0 967752 916976 0 R 11.3 1.8 0:03.79 gitea 13359 1000 20 0 968100 415048 0 R 11.3 0.8 0:02.02 gitea 13581 1000 20 0 49576 5160 0 R 11.3 0.0 0:00.47 gitea 12891 1000 20 0 968192 923508 0 R 11.1 1.8 0:03.08 gitea 6556 root 20 0 890824 16828 0 S 10.8 0.0 0:07.86 dockerd 13496 1000 20 0 967908 99616 0 R 10.1 0.2 0:01.62 gitea 12941 1000 20 0 1886768 924720 856 R 9.9 1.8 0:02.93 gitea 13124 1000 20 0 968284 476140 0 R 9.9 0.9 0:02.41 gitea 12636 1000 20 0 968540 923748 0 S 9.1 1.8 0:03.26 gitea 13057 1000 20 0 969596 383820 0 R 8.9 0.7 0:02.65 gitea 12529 1000 20 0 968448 923612 0 R 8.7 1.8 0:22.79 gitea 13524 1000 20 0 966516 33820 0 D 8.7 0.1 0:01.04 gitea ``` This happens serveral times a day.
Author
Owner

@micw commented on GitHub (Nov 6, 2018):

@lunny @xor-gate Would it be possible to create a 1.5.x docker imager that contains this profiling patch (or can I easily patch an existing image)? I'd really like to track this down, maybe we can fix this issue with the 1.6 release.

@micw commented on GitHub (Nov 6, 2018): @lunny @xor-gate Would it be possible to create a 1.5.x docker imager that contains this profiling patch (or can I easily patch an existing image)? I'd really like to track this down, maybe we can fix this issue with the 1.6 release.
Author
Owner

@adelowo commented on GitHub (Nov 6, 2018):

@micw You can try out an RC for 1.6.0 which contains this fix. I am currently running an RC in production anyways.

https://github.com/go-gitea/gitea/releases/tag/v1.6.0-rc2

@adelowo commented on GitHub (Nov 6, 2018): @micw You can try out an RC for 1.6.0 which contains this fix. I am currently running an RC in production anyways. https://github.com/go-gitea/gitea/releases/tag/v1.6.0-rc2
Author
Owner

@xor-gate commented on GitHub (Nov 6, 2018):

We are not backporting the profiling patch to 1.5.x as 1.6.0 will be release soon.

@xor-gate commented on GitHub (Nov 6, 2018): We are not backporting the profiling patch to 1.5.x as 1.6.0 will be release soon.
Author
Owner

@micw commented on GitHub (Nov 6, 2018):

Hi,

I ran the profiling on 1.6.0-rc2 but I cannot interpret the results.
What I have done:

  • waited until there's low load on the server
  • enabled profiling in config
  • waited until there was a peek of many gitea processes
  • immediately disabled profiling in config
  • waited until the last process of the peek terminated (~2 minutes)
  • looked into the last written files (those are from the processes which ran for ~2 minutes)

Here's the result:
pprof.zip

CPU profiling says that the process ran only 16 ms which is definitely wrong.

Any ideas?

@micw commented on GitHub (Nov 6, 2018): Hi, I ran the profiling on 1.6.0-rc2 but I cannot interpret the results. What I have done: * waited until there's low load on the server * enabled profiling in config * waited until there was a peek of many gitea processes * immediately disabled profiling in config * waited until the last process of the peek terminated (~2 minutes) * looked into the last written files (those are from the processes which ran for ~2 minutes) Here's the result: [pprof.zip](https://github.com/go-gitea/gitea/files/2553163/pprof.zip) CPU profiling says that the process ran only 16 ms which is definitely wrong. Any ideas?
Author
Owner

@xor-gate commented on GitHub (Nov 6, 2018):

I think you are analyzing the incorrect profiles. I have loaded the profiles into pprof (downloaded https://github.com/go-gitea/gitea/releases/download/v1.6.0-rc2/gitea-1.6.0-rc2-linux-amd64 in $PWD).

PPROF_BINARY_PATH=$PWD pprof -http 0.0.0.0:1337 cpuprofile_eh_183397202
PPROF_BINARY_PATH=$PWD pprof -http 0.0.0.0:1337 memprofile_eh_520066697

https://github.com/google/pprof

See screenshots:

knipsel
knipsel1

@xor-gate commented on GitHub (Nov 6, 2018): I think you are analyzing the incorrect profiles. I have loaded the profiles into pprof (downloaded https://github.com/go-gitea/gitea/releases/download/v1.6.0-rc2/gitea-1.6.0-rc2-linux-amd64 in `$PWD`). ``` PPROF_BINARY_PATH=$PWD pprof -http 0.0.0.0:1337 cpuprofile_eh_183397202 PPROF_BINARY_PATH=$PWD pprof -http 0.0.0.0:1337 memprofile_eh_520066697 ``` https://github.com/google/pprof See screenshots: ![knipsel](https://user-images.githubusercontent.com/1050166/48065729-cab0df80-e1cb-11e8-9938-1fe0c648c04d.PNG) ![knipsel1](https://user-images.githubusercontent.com/1050166/48065731-cab0df80-e1cb-11e8-8234-0385d6e6b0d7.PNG)
Author
Owner

@micw commented on GitHub (Nov 6, 2018):

All profiles that where written look that like. I did a short pprof -text on each file and grepped for duration. That one I uploaded had the absolutely longest duration.

If you look into the memory profile under alloc_space, you see that almost 3 GB are allocated. There are other profilings with >10 GB allocated.

May it be that this is the cause of the enormous load? And may it be that memory allocation (which is done by the OS, not by the go code) is not part of the profiling?

@micw commented on GitHub (Nov 6, 2018): All profiles that where written look that like. I did a short pprof -text on each file and grepped for duration. That one I uploaded had the absolutely longest duration. If you look into the memory profile under alloc_space, you see that almost 3 GB are allocated. There are other profilings with >10 GB allocated. May it be that this is the cause of the enormous load? And may it be that memory allocation (which is done by the OS, not by the go code) is not part of the profiling?
Author
Owner

@micw commented on GitHub (Nov 6, 2018):

I'm not that familar with golang but for me it reads at https://github.com/go-gitea/gitea/blob/master/modules/log/file.go#L147 that teh whole logfile is read into memory during gitea serv

@micw commented on GitHub (Nov 6, 2018): I'm not that familar with golang but for me it reads at https://github.com/go-gitea/gitea/blob/master/modules/log/file.go#L147 that teh whole logfile is read into memory during gitea serv
Author
Owner

@micw commented on GitHub (Nov 6, 2018):

I wiped the serv.log and the problem is gone...

@micw commented on GitHub (Nov 6, 2018): I wiped the serv.log and the problem is gone...
Author
Owner

@micw commented on GitHub (Nov 6, 2018):

Reviewed the code. Indeed the whole logfile is read. The purpose is log rotation. This should not be done in the logger, especially not if multiple processes log to the same file.

@micw commented on GitHub (Nov 6, 2018): Reviewed the code. Indeed the whole logfile is read. The purpose is log rotation. This should not be done in the logger, especially not if multiple processes log to the same file.
Author
Owner

@lafriks commented on GitHub (Nov 6, 2018):

Wow, that is crazy. Good find

@lafriks commented on GitHub (Nov 6, 2018): Wow, that is crazy. Good find
Author
Owner

@xor-gate commented on GitHub (Nov 6, 2018):

It was a long story, but finally 🎉

@xor-gate commented on GitHub (Nov 6, 2018): It was a long story, but finally 🎉
Author
Owner

@micw commented on GitHub (Nov 6, 2018):

I'd say the story ends when there is a patch ;-)

I suggest the following changes:

  1. Don't read the whole file into memory just to count the lines. Stream it. This solves the memory issue but the high I/O remains. But it's a small change that can be applied ad-hoc.

  2. Re-think log rotation
    Consider to move log rotation it out of gitea's scope and use "logrotate" or whatever.

  3. (if you decide to keep log rotation in gitea) Fix log rotation

  • remove the "maximum lines per log file" feature. This is an anti-pattern because it requires parsing of the whole file. Use the file size and/or age (ctime) instead.
  • remove the implicit log rotation. It does not play well in multi-process environments.
  • run a service task in the long running gitea server process that rotates all log files. Don't do log rotation in "serv" processes.
@micw commented on GitHub (Nov 6, 2018): I'd say the story ends when there is a patch ;-) I suggest the following changes: 1. Don't read the whole file into memory just to count the lines. Stream it. This solves the memory issue but the high I/O remains. But it's a small change that can be applied ad-hoc. 2. Re-think log rotation Consider to move log rotation it out of gitea's scope and use "logrotate" or whatever. 3. (if you decide to keep log rotation in gitea) Fix log rotation - remove the "maximum lines per log file" feature. This is an anti-pattern because it requires parsing of the whole file. Use the file size and/or age (ctime) instead. - remove the implicit log rotation. It does not play well in multi-process environments. - run a service task in the long running gitea server process that rotates all log files. Don't do log rotation in "serv" processes.
Author
Owner

@adelowo commented on GitHub (Nov 6, 2018):

Oh boy, what a find. I agree file size should be used

@adelowo commented on GitHub (Nov 6, 2018): Oh boy, what a find. I agree file size should be used
Author
Owner

@adelowo commented on GitHub (Nov 6, 2018):

Or move log rotation totally out of gitea's responsibility

@adelowo commented on GitHub (Nov 6, 2018): Or move log rotation totally out of gitea's responsibility
Author
Owner

@lafriks commented on GitHub (Nov 6, 2018):

I have submitted PR #5282 to remove logfile max line count functionality as max file size should be more than enough for file rotation.

@lafriks commented on GitHub (Nov 6, 2018): I have submitted PR #5282 to remove logfile max line count functionality as max file size should be more than enough for file rotation.
Author
Owner

@xor-gate commented on GitHub (Nov 6, 2018):

Yeah I think this option is also not "normal": https://manpages.debian.org/jessie/logrotate/logrotate.8.en.html

@xor-gate commented on GitHub (Nov 6, 2018): Yeah I think this option is also not "normal": https://manpages.debian.org/jessie/logrotate/logrotate.8.en.html
Author
Owner

@micw commented on GitHub (Nov 6, 2018):

@xor-gate absolutely, I never saw a line based log rotation before. Removing the "feature" should cause no issues. I can barely imagine any use case where line based rotation is required ;)

@micw commented on GitHub (Nov 6, 2018): @xor-gate absolutely, I never saw a line based log rotation before. Removing the "feature" should cause no issues. I can barely imagine any use case where line based rotation is required ;)
Author
Owner

@micw commented on GitHub (Nov 6, 2018):

I think we can close this issue when https://github.com/go-gitea/gitea/pull/5282 is merged. I'll create a new one to rethink/rework log rotation.

@micw commented on GitHub (Nov 6, 2018): I think we can close this issue when https://github.com/go-gitea/gitea/pull/5282 is merged. I'll create a new one to rethink/rework log rotation.
Author
Owner

@xor-gate commented on GitHub (Nov 6, 2018):

As @lafriks mentions the size would be enough to keep things within reasonable proportions without blowing up the storage. Thanks all for this teamwork!

@xor-gate commented on GitHub (Nov 6, 2018): As @lafriks mentions the size would be enough to keep things within reasonable proportions without blowing up the storage. Thanks all for this teamwork!
Author
Owner

@micw commented on GitHub (Nov 6, 2018):

I created https://github.com/go-gitea/gitea/issues/5285 to track the rest of the log rotation that does not affect performance/memory usage.

@micw commented on GitHub (Nov 6, 2018): I created https://github.com/go-gitea/gitea/issues/5285 to track the rest of the log rotation that does not affect performance/memory usage.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#2059