Error: routers/repo/http.go:430:serviceRPC() [E] Fail to serve RPC(upload-pack): exit status 128 #4324

Closed
opened 2025-11-02 05:46:08 -06:00 by GiteaMirror · 24 comments
Owner

Originally created by @ianw on GitHub (Nov 15, 2019).

  • Gitea version (or commit ref): Gitea version v1.9.5 built with GNU Make 4.2.1, go1.13.4 : bindata,
  • Git version: git version 2.20.1
  • Operating system:
  • Database (use [x]):
    • PostgreSQL
    • MySQL
    • MSSQL
    • SQLite
  • Can you reproduce the bug at https://try.gitea.io:
    • Yes (provide example URL)
    • No
    • Not relevant

Description

One of our gitea hosts has started showing a constant error

019/11/14 23:30:42 routers/repo/http.go:430:serviceRPC() [E] Fail to serve RPC(upload-pack): exit status 128 - {[102 97 116 97 108 58 32 116 104 101 32 114 101 109 111 116 101 32 101 110 100 32 104 117 110 103 32 117 112 32 117 110 101 120 112 101 99 116 101 100 108 121 10] 0 0}

The numbers are always the same; and it doesn't tell me what repos it is associated with. A slightly longer extract:

[Macaron] 2019-11-14 23:30:35: Completed POST /openstack/performance-docs/git-upload-pack 200 OK in 15.803779ms
2019/11/14 23:30:35 routers/repo/http.go:430:serviceRPC() [E] Fail to serve RPC(upload-pack): exit status 128 - {[102 97 116 97 108 58 32 116 104 101 32 114 101 109 111 116 101 32 101 110 100 32 104 117 110 103 32 117 112 32 117 110 101 120 112 101 99 116 101 100 108 121 10] 0 0}
[Macaron] 2019-11-14 23:30:35: Started POST /openstack/performance-docs/git-upload-pack for 38.108.68.124
[Macaron] 2019-11-14 23:30:35: Completed POST /openstack/performance-docs/git-upload-pack 200 OK in 21.116993ms
[Macaron] 2019-11-14 23:30:35: Started POST /openstack/openstack-helm-infra/git-upload-pack for 38.108.68.124
[Macaron] 2019-11-14 23:30:35: Completed GET /openstack/ironic-python-agent/info/refs?service=git-upload-pack 200 OK in 5.859975155s
[Macaron] 2019-11-14 23:30:35: Started GET /x/networking-fortinet/info/refs?service=git-upload-pack for 38.108.68.124
[Macaron] 2019-11-14 23:30:35: Started GET /openstack/designate/info/refs?service=git-upload-pack for 38.108.68.124
[Macaron] 2019-11-14 23:30:35: Completed GET /openstack/openstack-ansible-os_gnocchi/info/refs?service=git-upload-pack 200 OK in 1.336008289s
[Macaron] 2019-11-14 23:30:35: Started GET /openstack/tacker-specs/info/refs?service=git-upload-pack for 38.108.68.124
[Macaron] 2019-11-14 23:30:35: Started POST /openstack/openstack-ansible-os_gnocchi/git-upload-pack for 38.108.68.124
[Macaron] 2019-11-14 23:30:35: Completed POST /openstack/openstack-ansible-os_gnocchi/git-upload-pack 200 OK in 44.754598ms
2019/11/14 23:30:35 routers/repo/http.go:430:serviceRPC() [E] Fail to serve RPC(upload-pack): exit status 128 - {[102 97 116 97 108 58 32 116 104 101 32 114 101 109 111 116 101 32 101 110 100 32 104 117 110 103 32 117 112 32 117 110 101 120 112 101 99 116 101 100 108 121 10] 0 0}
[Macaron] 2019-11-14 23:30:35: Completed GET /x/networking-fortinet/info/refs?service=git-upload-pack 200 OK in 292.002759ms
[Macaron] 2019-11-14 23:30:35: Started POST /openstack/openstack-ansible-os_gnocchi/git-upload-pack for 38.108.68.124
[Macaron] 2019-11-14 23:30:35: Completed POST /openstack/openstack-ansible-os_gnocchi/git-upload-pack 200 OK in 29.21295ms
[Macaron] 2019-11-14 23:30:36: Completed POST /openstack/openstack-helm-infra/git-upload-pack 200 OK in 609.53835ms
[Macaron] 2019-11-14 23:30:36: Started GET /openstack/openstack-ansible-os_sahara/info/refs?service=git-upload-pack for 38.108.68.124
[Macaron] 2019-11-14 23:30:36: Completed GET /openstack/fuel-qa.git/info/refs?service=git-upload-pack 200 OK in 19.056143413s
[Macaron] 2019-11-14 23:30:36: Completed GET /openstack/tacker-specs/info/refs?service=git-upload-pack 200 OK in 760.904443ms
[Macaron] 2019-11-14 23:30:36: Started POST /openstack/tacker-specs/git-upload-pack for 38.108.68.124
[Macaron] 2019-11-14 23:30:36: Completed POST /openstack/tacker-specs/git-upload-pack 200 OK in 26.381308ms
2019/11/14 23:30:36 routers/repo/http.go:430:serviceRPC() [E] Fail to serve RPC(upload-pack): exit status 128 - {[102 97 116 97 108 58 32 116 104 101 32 114 101 109 111 116 101 32 101 110 100 32 104 117 110 103 32 117 112 32 117 110 101 120 112 101 99 116 101 100 108 121 10] 0 0}
[Macaron] 2019-11-14 23:30:36: Started POST /openstack/tacker-specs/git-upload-pack for 38.108.68.124

I have deleted all the repos and recreated them (using "Reinitialize all missing Git repositories for which records exist") but it is still happened.

I then started to strace the gitea process to see what was going on. I managed to catch what I think is the cause of this error:

12027 execve("/usr/bin/git", ["/usr/bin/git", "upload-pack", "--stateless-rpc", "/data/git/repositories/x/docker-machine-openstack.git"], 0xc009602190 /* 8 vars */ <unfinished ...>
...
12027 write(2, "fatal: the remote end hung up unexpectedly\n", 43 <unfinished ...>
12027 <... write resumed> )             = 43
12027 exit_group(128 <unfinished ...>
12027 <... exit_group resumed>)         = ?
12027 +++ exited with 128 +++

This command matches what would be running at https://github.com/go-gitea/gitea/blob/master/routers/repo/http.go#L418 and has the 128 exit code, so I think it matches up. It would be really helpful if this was expressed in the error message.

I'm investigating the host to see if there's any networking type issues that might be affecting clients, although it doesn't look like it to me.

One user who seemed to hit this said that their clone was just stuck for hours, they didn't get an specific error message. They were using git 2.21.0

Originally created by @ianw on GitHub (Nov 15, 2019). - Gitea version (or commit ref): Gitea version v1.9.5 built with GNU Make 4.2.1, go1.13.4 : bindata, - Git version: git version 2.20.1 - Operating system: - Database (use `[x]`): - [ ] PostgreSQL - [x] MySQL - [ ] MSSQL - [ ] SQLite - Can you reproduce the bug at https://try.gitea.io: - [ ] Yes (provide example URL) - [x] No - [ ] Not relevant ## Description One of our gitea hosts has started showing a constant error ``` 019/11/14 23:30:42 routers/repo/http.go:430:serviceRPC() [E] Fail to serve RPC(upload-pack): exit status 128 - {[102 97 116 97 108 58 32 116 104 101 32 114 101 109 111 116 101 32 101 110 100 32 104 117 110 103 32 117 112 32 117 110 101 120 112 101 99 116 101 100 108 121 10] 0 0} ``` The numbers are always the same; and it doesn't tell me what repos it is associated with. A slightly longer extract: ``` [Macaron] 2019-11-14 23:30:35: Completed POST /openstack/performance-docs/git-upload-pack 200 OK in 15.803779ms 2019/11/14 23:30:35 routers/repo/http.go:430:serviceRPC() [E] Fail to serve RPC(upload-pack): exit status 128 - {[102 97 116 97 108 58 32 116 104 101 32 114 101 109 111 116 101 32 101 110 100 32 104 117 110 103 32 117 112 32 117 110 101 120 112 101 99 116 101 100 108 121 10] 0 0} [Macaron] 2019-11-14 23:30:35: Started POST /openstack/performance-docs/git-upload-pack for 38.108.68.124 [Macaron] 2019-11-14 23:30:35: Completed POST /openstack/performance-docs/git-upload-pack 200 OK in 21.116993ms [Macaron] 2019-11-14 23:30:35: Started POST /openstack/openstack-helm-infra/git-upload-pack for 38.108.68.124 [Macaron] 2019-11-14 23:30:35: Completed GET /openstack/ironic-python-agent/info/refs?service=git-upload-pack 200 OK in 5.859975155s [Macaron] 2019-11-14 23:30:35: Started GET /x/networking-fortinet/info/refs?service=git-upload-pack for 38.108.68.124 [Macaron] 2019-11-14 23:30:35: Started GET /openstack/designate/info/refs?service=git-upload-pack for 38.108.68.124 [Macaron] 2019-11-14 23:30:35: Completed GET /openstack/openstack-ansible-os_gnocchi/info/refs?service=git-upload-pack 200 OK in 1.336008289s [Macaron] 2019-11-14 23:30:35: Started GET /openstack/tacker-specs/info/refs?service=git-upload-pack for 38.108.68.124 [Macaron] 2019-11-14 23:30:35: Started POST /openstack/openstack-ansible-os_gnocchi/git-upload-pack for 38.108.68.124 [Macaron] 2019-11-14 23:30:35: Completed POST /openstack/openstack-ansible-os_gnocchi/git-upload-pack 200 OK in 44.754598ms 2019/11/14 23:30:35 routers/repo/http.go:430:serviceRPC() [E] Fail to serve RPC(upload-pack): exit status 128 - {[102 97 116 97 108 58 32 116 104 101 32 114 101 109 111 116 101 32 101 110 100 32 104 117 110 103 32 117 112 32 117 110 101 120 112 101 99 116 101 100 108 121 10] 0 0} [Macaron] 2019-11-14 23:30:35: Completed GET /x/networking-fortinet/info/refs?service=git-upload-pack 200 OK in 292.002759ms [Macaron] 2019-11-14 23:30:35: Started POST /openstack/openstack-ansible-os_gnocchi/git-upload-pack for 38.108.68.124 [Macaron] 2019-11-14 23:30:35: Completed POST /openstack/openstack-ansible-os_gnocchi/git-upload-pack 200 OK in 29.21295ms [Macaron] 2019-11-14 23:30:36: Completed POST /openstack/openstack-helm-infra/git-upload-pack 200 OK in 609.53835ms [Macaron] 2019-11-14 23:30:36: Started GET /openstack/openstack-ansible-os_sahara/info/refs?service=git-upload-pack for 38.108.68.124 [Macaron] 2019-11-14 23:30:36: Completed GET /openstack/fuel-qa.git/info/refs?service=git-upload-pack 200 OK in 19.056143413s [Macaron] 2019-11-14 23:30:36: Completed GET /openstack/tacker-specs/info/refs?service=git-upload-pack 200 OK in 760.904443ms [Macaron] 2019-11-14 23:30:36: Started POST /openstack/tacker-specs/git-upload-pack for 38.108.68.124 [Macaron] 2019-11-14 23:30:36: Completed POST /openstack/tacker-specs/git-upload-pack 200 OK in 26.381308ms 2019/11/14 23:30:36 routers/repo/http.go:430:serviceRPC() [E] Fail to serve RPC(upload-pack): exit status 128 - {[102 97 116 97 108 58 32 116 104 101 32 114 101 109 111 116 101 32 101 110 100 32 104 117 110 103 32 117 112 32 117 110 101 120 112 101 99 116 101 100 108 121 10] 0 0} [Macaron] 2019-11-14 23:30:36: Started POST /openstack/tacker-specs/git-upload-pack for 38.108.68.124 ``` I have deleted all the repos and recreated them (using "Reinitialize all missing Git repositories for which records exist") but it is still happened. I then started to strace the gitea process to see what was going on. I managed to catch what I think is the cause of this error: ``` 12027 execve("/usr/bin/git", ["/usr/bin/git", "upload-pack", "--stateless-rpc", "/data/git/repositories/x/docker-machine-openstack.git"], 0xc009602190 /* 8 vars */ <unfinished ...> ... 12027 write(2, "fatal: the remote end hung up unexpectedly\n", 43 <unfinished ...> 12027 <... write resumed> ) = 43 12027 exit_group(128 <unfinished ...> 12027 <... exit_group resumed>) = ? 12027 +++ exited with 128 +++ ``` This command matches what would be running at https://github.com/go-gitea/gitea/blob/master/routers/repo/http.go#L418 and has the 128 exit code, so I think it matches up. It would be really helpful if this was expressed in the error message. I'm investigating the host to see if there's any networking type issues that might be affecting clients, although it doesn't look like it to me. One user who seemed to hit this said that their clone was just stuck for hours, they didn't get an specific error message. They were using git 2.21.0
Author
Owner

@lunny commented on GitHub (Nov 15, 2019):

We need a more friendly log on that error.

@lunny commented on GitHub (Nov 15, 2019): We need a more friendly log on that error.
Author
Owner

@zeripath commented on GitHub (Nov 15, 2019):

I'm going to make an educated guess:

  • Please resynchronize hooks.
  • Please ensure that your INTERNAL_ROOT_URL makes sense - so that hooks can talk to your server.
  • Please ensure that your gitea-repositories are mounted in such a way that the hooks are executable.

I'll try to decode the log properly at some point - it's just ASCII bytes - if you could do that and put it up that would be great - and see if I can put up a pr to make log more reasonably. (It would be a single line pr.)

@zeripath commented on GitHub (Nov 15, 2019): I'm going to make an educated guess: * Please resynchronize hooks. * Please ensure that your INTERNAL_ROOT_URL makes sense - so that hooks can talk to your server. * Please ensure that your gitea-repositories are mounted in such a way that the hooks are executable. I'll try to decode the log properly at some point - it's just ASCII bytes - if you could do that and put it up that would be great - and see if I can put up a pr to make log more reasonably. (It would be a single line pr.)
Author
Owner

@ianw commented on GitHub (Nov 15, 2019):

I'll try to decode the log properly at some point - it's just ASCII bytes - if you could do that and put it up that would be great

Ha, OK that's a fun one. It works out to

ZmF0YWw6IHRoZSByZW1vdGUgZW5kIGh1bmcgdXAgdW5leHBlY3RlZGx5Cg==

which is base64 decoded

fatal: the remote end hung up unexpectedly

which is indeed what I saw from the strace. So that confirms that :) But it would be great if it correlated which repo/request it was coming from, as I still have no idea.

I don't think any of your suggestions have changed at all. This is part of a 8-node cluster fronted by a haproxy, and it doesn't seem the other nodes are exhibiting this issue. I'm really not sure when it started, which i know isn't helpful :(

@ianw commented on GitHub (Nov 15, 2019): > I'll try to decode the log properly at some point - it's just ASCII bytes - if you could do that and put it up that would be great Ha, OK that's a fun one. It works out to ``` ZmF0YWw6IHRoZSByZW1vdGUgZW5kIGh1bmcgdXAgdW5leHBlY3RlZGx5Cg== ``` which is base64 decoded ``` fatal: the remote end hung up unexpectedly ``` which is indeed what I saw from the strace. So that confirms that :) But it would be great if it correlated which repo/request it was coming from, as I still have no idea. I don't think any of your suggestions have changed at all. This is part of a 8-node cluster fronted by a haproxy, and it doesn't seem the other nodes are exhibiting this issue. I'm really not sure when it started, which i know isn't helpful :(
Author
Owner

@ianw commented on GitHub (Nov 18, 2019):

This is a trace of the git upload-pack process that seems to die and cause this issue 16131.trace.txt

@ianw commented on GitHub (Nov 18, 2019): This is a trace of the git upload-pack process that seems to die and cause this issue [16131.trace.txt](https://github.com/go-gitea/gitea/files/3861071/16131.trace.txt)
Author
Owner

@tbreeds commented on GitHub (Nov 18, 2019):

I'm seeing the issue that @ianw is describing.

We have a local repo that passes git fsck --full without errors. Talking directly (not through haproxy) to multiple getea servers shows that locally 'upload-pack' seems to block forever.

I'm using git version 2.21.0 from Fedora 30. I'm happy to run whatever debugging is needed to diagnose this.

@tbreeds commented on GitHub (Nov 18, 2019): I'm seeing the issue that @ianw is describing. We have a local repo that passes `git fsck --full` without errors. Talking directly (not through haproxy) to multiple getea servers shows that locally 'upload-pack' seems to block forever. I'm using `git version 2.21.0` from Fedora 30. I'm happy to run whatever debugging is needed to diagnose this.
Author
Owner

@guillep2k commented on GitHub (Nov 19, 2019):

May be some git hook that takes too long? Git would abort on the client side and the server would show that error when the hook finishes.

@guillep2k commented on GitHub (Nov 19, 2019): May be some git hook that takes too long? Git would abort on the client side and the server would show that error when the hook finishes.
Author
Owner

@ianw commented on GitHub (Nov 19, 2019):

@guillep2k I don't think so; I'm not aware of any hooks really, and you can see in the trace I put up before it happens very quickly?

edit: looking again there's no timestamps, so this is not at all obvious :) but it's not stalling, afaict

@ianw commented on GitHub (Nov 19, 2019): @guillep2k I don't think so; I'm not aware of any hooks really, and you can see in the trace I put up before it happens very quickly? edit: looking again there's no timestamps, so this is not at all obvious :) but it's not stalling, afaict
Author
Owner

@guillep2k commented on GitHub (Nov 19, 2019):

Well, the remote end (i.e. the client) seems to be the one closing the connection, so if it's not because of a timeout, something in the response must be upsetting it. Perhaps the client has the problem? I imagine the following test: clone the repo in another machine (B) and add (B) as a remote for (A); then push your changes from (A) to (B) and see if you succeed; then you can also try pushing from (B) to Gitea after that. Otherwise it's worth looking for more info on the client side to see what's making it abort the connection.

@guillep2k commented on GitHub (Nov 19, 2019): Well, the remote end (i.e. the client) seems to be the one closing the connection, so if it's not because of a timeout, something in the response must be upsetting it. Perhaps the client has the problem? I imagine the following test: clone the repo in another machine (B) and add (B) as a remote for (A); then push your changes from (A) to (B) and see if you succeed; then you can also try pushing from (B) to Gitea after that. Otherwise it's worth looking for more info on the client side to see what's making it abort the connection.
Author
Owner

@tbreeds commented on GitHub (Nov 19, 2019):

The client still has an open socket to the gitea server:

[tony@thor ~]$ lsof -np 5978
COMMAND  PID USER   FD   TYPE   DEVICE  SIZE/OFF     NODE NAME
git     5978 tony  cwd    DIR    253,3      4096 10365512 /home/tony/projects/OSF/openstack/nova_bad
git     5978 tony  rtd    DIR    253,1      4096        2 /
git     5978 tony  txt    REG    253,1   3430384  2011608 /usr/bin/git
git     5978 tony  mem    REG    253,3 133470802 10093721 /home/tony/projects/OSF/openstack/nova_bad/.git/objects/pack/pack-2650373f98951c72fb12e7d0a68640cd8d7fa76a.pack
git     5978 tony  mem    REG    253,3    681879 10093737 /home/tony/projects/OSF/openstack/nova_bad/.git/packed-refs
git     5978 tony  mem    REG    253,3  18517192 10093724 /home/tony/projects/OSF/openstack/nova_bad/.git/objects/pack/pack-2650373f98951c72fb12e7d0a68640cd8d7fa76a.idx
git     5978 tony  mem    REG    253,1 217750512  1966387 /usr/lib/locale/locale-archive
git     5978 tony  mem    REG    253,1   6697832  1972738 /usr/lib64/libc-2.29.so
git     5978 tony  mem    REG    253,1    149504  1973332 /usr/lib64/librt-2.29.so
git     5978 tony  mem    REG    253,1    823192  1972989 /usr/lib64/libpthread-2.29.so
git     5978 tony  mem    REG    253,1    131200  1966784 /usr/lib64/libz.so.1.2.11
git     5978 tony  mem    REG    253,1    630304  1977533 /usr/lib64/libpcre2-8.so.0.8.0
git     5978 tony  mem    REG    253,1    436736  1972262 /usr/lib64/ld-2.29.so
git     5978 tony    0r  FIFO     0,13       0t0 12430326 pipe
git     5978 tony    1w  FIFO     0,13       0t0 12430327 pipe
git     5978 tony    2u   CHR   136,16       0t0       19 /dev/pts/16
git     5978 tony    3u  IPv4 12430315       0t0      TCP 192.168.1.14:41436->38.108.68.172:hbci (ESTABLISHED)

The last thing git (with GIT_CURL_VERBOSE) prints is:

* We are completely uploaded and fine
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Type: application/x-git-upload-pack-result
< Set-Cookie: lang=en-US; Path=/; Max-Age=2147483647
< Set-Cookie: i_like_gitea=198158244041ff80; Path=/; HttpOnly
< Set-Cookie: _csrf=SKqrpPpbdfEPlzwfA3rP7JNLK4E6MTU3NDEyNTU4NTk0ODA1Njk1MA%3D%3D; Path=/; Expires=Wed, 20 Nov 2019 01:06:25 GMT; HttpOnly
< X-Frame-Options: SAMEORIGIN
< Date: Tue, 19 Nov 2019 01:06:26 GMT
< Transfer-Encoding: chunked
< 
* Connection #0 to host gitea01.opendev.org left intact

So AFAICT the client isn't dropping the connection, an strace on the client doesn't show the any calls to close() on the fd

@tbreeds commented on GitHub (Nov 19, 2019): The client still has an open socket to the gitea server: ``` [tony@thor ~]$ lsof -np 5978 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME git 5978 tony cwd DIR 253,3 4096 10365512 /home/tony/projects/OSF/openstack/nova_bad git 5978 tony rtd DIR 253,1 4096 2 / git 5978 tony txt REG 253,1 3430384 2011608 /usr/bin/git git 5978 tony mem REG 253,3 133470802 10093721 /home/tony/projects/OSF/openstack/nova_bad/.git/objects/pack/pack-2650373f98951c72fb12e7d0a68640cd8d7fa76a.pack git 5978 tony mem REG 253,3 681879 10093737 /home/tony/projects/OSF/openstack/nova_bad/.git/packed-refs git 5978 tony mem REG 253,3 18517192 10093724 /home/tony/projects/OSF/openstack/nova_bad/.git/objects/pack/pack-2650373f98951c72fb12e7d0a68640cd8d7fa76a.idx git 5978 tony mem REG 253,1 217750512 1966387 /usr/lib/locale/locale-archive git 5978 tony mem REG 253,1 6697832 1972738 /usr/lib64/libc-2.29.so git 5978 tony mem REG 253,1 149504 1973332 /usr/lib64/librt-2.29.so git 5978 tony mem REG 253,1 823192 1972989 /usr/lib64/libpthread-2.29.so git 5978 tony mem REG 253,1 131200 1966784 /usr/lib64/libz.so.1.2.11 git 5978 tony mem REG 253,1 630304 1977533 /usr/lib64/libpcre2-8.so.0.8.0 git 5978 tony mem REG 253,1 436736 1972262 /usr/lib64/ld-2.29.so git 5978 tony 0r FIFO 0,13 0t0 12430326 pipe git 5978 tony 1w FIFO 0,13 0t0 12430327 pipe git 5978 tony 2u CHR 136,16 0t0 19 /dev/pts/16 git 5978 tony 3u IPv4 12430315 0t0 TCP 192.168.1.14:41436->38.108.68.172:hbci (ESTABLISHED) ``` The last thing git (with GIT_CURL_VERBOSE) prints is: ``` * We are completely uploaded and fine * Mark bundle as not supporting multiuse < HTTP/1.1 200 OK < Content-Type: application/x-git-upload-pack-result < Set-Cookie: lang=en-US; Path=/; Max-Age=2147483647 < Set-Cookie: i_like_gitea=198158244041ff80; Path=/; HttpOnly < Set-Cookie: _csrf=SKqrpPpbdfEPlzwfA3rP7JNLK4E6MTU3NDEyNTU4NTk0ODA1Njk1MA%3D%3D; Path=/; Expires=Wed, 20 Nov 2019 01:06:25 GMT; HttpOnly < X-Frame-Options: SAMEORIGIN < Date: Tue, 19 Nov 2019 01:06:26 GMT < Transfer-Encoding: chunked < * Connection #0 to host gitea01.opendev.org left intact ``` So AFAICT the client isn't dropping the connection, an strace on the client doesn't show the any calls to close() on the fd
Author
Owner

@guillep2k commented on GitHub (Nov 19, 2019):

OK, but you can see that:

12027 execve("/usr/bin/git", ["/usr/bin/git", "upload-pack", "--stateless-rpc", "/data/git/repositories/x/docker-machine-openstack.git"], 0xc009602190 /* 8 vars */ <unfinished ...>
...
12027 write(2, "fatal: the remote end hung up unexpectedly\n", 43 <unfinished ...>

execve means that the process will transfer into /usr/bin/git, so anything from there on is git's doing. And that process (git) is printing the "fatal: ..." error. I cannot however pinpoint from your strace which system call is currently returning any errors (they all seem to return normal >= 0 values).

Maybe your repo is not in a local file system?

@guillep2k commented on GitHub (Nov 19, 2019): OK, but you can see that: ``` 12027 execve("/usr/bin/git", ["/usr/bin/git", "upload-pack", "--stateless-rpc", "/data/git/repositories/x/docker-machine-openstack.git"], 0xc009602190 /* 8 vars */ <unfinished ...> ... 12027 write(2, "fatal: the remote end hung up unexpectedly\n", 43 <unfinished ...> ``` `execve` means that the process will transfer into `/usr/bin/git`, so anything from there on is git's doing. And that process (git) _is_ printing the "fatal: ..." error. I cannot however pinpoint from your strace which system call is currently returning any errors (they all seem to return normal >= 0 values). Maybe your repo is not in a local file system?
Author
Owner

@ianw commented on GitHub (Nov 19, 2019):

@guillep2k all the repos are local; gitea is running in a container and has access to the repos that way. The only thing pushing to gitea is gerrit as it replicates. Otherwise it's read traffic. The interesting thing is that there's a TCP load-balancer between each gitea node. However, haproxy does not appear to be showing any sort of errors

@ianw commented on GitHub (Nov 19, 2019): @guillep2k all the repos are local; gitea is running in a container and has access to the repos that way. The only thing pushing to gitea is gerrit as it replicates. Otherwise it's read traffic. The interesting thing is that there's a TCP load-balancer between each gitea node. However, haproxy does not appear to be showing any sort of errors
Author
Owner

@guillep2k commented on GitHub (Nov 19, 2019):

I don't understand: why are there two instances of gitea? A load balancer suggests a single file-system shared between the instances and that should give you all kinds of problems (not necessarily this one, however). Anyway, it doesn't seem to be Gitea but git who's spitting the errors.

As a last messure, I'd consider upgrading to 1.9.6. There has been some rework in the way Gitea handles git resources; especifically to avoid keeping open handles (#8901, #8958). This may be related to your issue.

@guillep2k commented on GitHub (Nov 19, 2019): I don't understand: why are there two instances of gitea? A load balancer suggests a single file-system shared between the instances and that should give you all kinds of problems (not necessarily this one, however). Anyway, it doesn't seem to be Gitea but `git` who's spitting the errors. As a last messure, I'd consider upgrading to 1.9.6. There has been some rework in the way Gitea handles git resources; especifically to avoid keeping open handles (#8901, #8958). This may be related to your issue.
Author
Owner

@ianw commented on GitHub (Nov 19, 2019):

@guillep2k there's no shared file-system in this case; each separate gitea node is on it's own server. gerrit's repos are the one-true-source, and it replicates changes out to the gitea nodes -- it is the only writer. I agree that it looks like git is dying here, but it's very odd that this just started happening.

We are seeing this across multiple servers, however. It's not just one.

I agree we're not being super helpful here as I don't know exactly when this started. I've tried to dump the connection, but it's all just encrypted binary stuff which will take a whole heap of effort to decode in a tracer. But I didn't see anything obvious.

I think we'll have to try upgrading at this point, and if we see the issue continue with 1.9.6 dig deeper.

@ianw commented on GitHub (Nov 19, 2019): @guillep2k there's no shared file-system in this case; each separate gitea node is on it's own server. gerrit's repos are the one-true-source, and it replicates changes out to the gitea nodes -- it is the only writer. I agree that it looks like git is dying here, but it's very odd that this just started happening. We are seeing this across multiple servers, however. It's not just one. I agree we're not being super helpful here as I don't know exactly when this started. I've tried to dump the connection, but it's all just encrypted binary stuff which will take a whole heap of effort to decode in a tracer. But I didn't see anything obvious. I think we'll have to try upgrading at this point, and if we see the issue continue with 1.9.6 dig deeper.
Author
Owner

@tbreeds commented on GitHub (Nov 22, 2019):

@ianw and team upgraded to 1.9.6 and the problem persists.

With the repo below we've recreated the "hang" on a number of Linux distros and git versions

Grab the repo @ https://ozlabs.org/~tony/nova_bad.tar.gz and unpack it and then run something like:

GIT_TRACE=2 GIT_CURL_VERBOSE=2 GIT_TRACE_PERFORMANCE=2 GIT_TRACE_PACK_ACCESS=2 GIT_TRACE_PACKET=2 GIT_TRACE_PACKFILE=2 GIT_TRACE_SETUP=2 GIT_TRACE_SHALLOW=2 git -c http.sslVerify=false remote -v update

It's entirely possible that there is a problem with that repo that git fsck doesn't detect but It seems like the client connects to gitea, which exec's git. git dies but gitea doesn't notice and doesn't clean up the client connection.

Any ideas of what else to try? If not I'll using a local gitea server at some point next week

@tbreeds commented on GitHub (Nov 22, 2019): @ianw and team upgraded to 1.9.6 and the problem persists. With the repo below we've recreated the "hang" on a number of Linux distros and git versions Grab the repo @ https://ozlabs.org/~tony/nova_bad.tar.gz and unpack it and then run something like: `GIT_TRACE=2 GIT_CURL_VERBOSE=2 GIT_TRACE_PERFORMANCE=2 GIT_TRACE_PACK_ACCESS=2 GIT_TRACE_PACKET=2 GIT_TRACE_PACKFILE=2 GIT_TRACE_SETUP=2 GIT_TRACE_SHALLOW=2 git -c http.sslVerify=false remote -v update` It's entirely possible that there is a problem with that repo that `git fsck` doesn't detect but It *seems* like the client connects to gitea, which exec's git. git dies but gitea doesn't notice and doesn't clean up the client connection. Any ideas of what else to try? If not I'll using a local gitea server at some point next week
Author
Owner

@everflux commented on GitHub (Dec 2, 2019):

I suspect that I have the same problem: An existing gitea repo working fine until some commit. After that pushing hangs and the git remote http process hogs a complete cpu core. After 8 minutes a timeout occurs. I deleted the repo and tried to do a clean push, same problem.
Tried aggressive git gc, full fsck, did not help.
Pushing the very same repo to gitlab works without a problem, though it is pushed using ssh instead of https.
Any way to triage this? (I run gitea with sqlite in kubernetes on ARM64 behin traefik as reverse proxy and another traefik as ingress, but I doubt that this has to do with the issue. I think it would help to know which commit triggered the problem, but I have no idea how to search for it.)

@everflux commented on GitHub (Dec 2, 2019): I suspect that I have the same problem: An existing gitea repo working fine until some commit. After that pushing hangs and the git remote http process hogs a complete cpu core. After 8 minutes a timeout occurs. I deleted the repo and tried to do a clean push, same problem. Tried aggressive git gc, full fsck, did not help. Pushing the very same repo to gitlab works without a problem, though it is pushed using ssh instead of https. Any way to triage this? (I run gitea with sqlite in kubernetes on ARM64 behin traefik as reverse proxy and another traefik as ingress, but I doubt that this has to do with the issue. I think it would help to know which commit triggered the problem, but I have no idea how to search for it.)
Author
Owner

@lunny commented on GitHub (Dec 2, 2019):

@tbreeds @everflux Is that repository a public one? If you can share it or find a simliar project, that would help to find the problem.

@lunny commented on GitHub (Dec 2, 2019): @tbreeds @everflux Is that repository a public one? If you can share it or find a simliar project, that would help to find the problem.
Author
Owner

@everflux commented on GitHub (Dec 2, 2019):

Unfortunately not - I know it is frustrating.
But if there is a script or something to triage the exact commit I am happy to help out. (Something like history replay and try-push each commit?)

@everflux commented on GitHub (Dec 2, 2019): Unfortunately not - I know it is frustrating. But if there is a script or something to triage the exact commit I am happy to help out. (Something like history replay and try-push each commit?)
Author
Owner

@tbreeds commented on GitHub (Dec 2, 2019):

@tbreeds @everflux Is that repository a public one? If you can share it or find a simliar project, that would help to find the problem.

The repo I'm using is public and I have a tarball of my clone that seems to reliably reproduce the git issue on a variety of clients/distributions

@tbreeds commented on GitHub (Dec 2, 2019): > @tbreeds @everflux Is that repository a public one? If you can share it or find a simliar project, that would help to find the problem. The repo I'm using is public and I have a tarball of my clone that seems to reliably reproduce the git issue on a variety of clients/distributions
Author
Owner

@everflux commented on GitHub (Dec 16, 2019):

Can you share the project for @lunny to be able to triage this?

@everflux commented on GitHub (Dec 16, 2019): Can you share the project for @lunny to be able to triage this?
Author
Owner

@tbreeds commented on GitHub (Dec 17, 2019):

@everflux @lunny I'm not sure exactly what you need me to share?

@tbreeds commented on GitHub (Dec 17, 2019): @everflux @lunny I'm not sure exactly what you need me to share?
Author
Owner

@everflux commented on GitHub (Dec 19, 2019):

I assume it would be helpful if you could share your repo and identify the commit from which point the gitea Push fails

@everflux commented on GitHub (Dec 19, 2019): I assume it would be helpful if you could share your repo and identify the commit from which point the gitea Push fails
Author
Owner

@zeripath commented on GitHub (Jan 4, 2020):

Hi peeps! So I finally got round to looking at this.

First of all I can't replicate this on master on sqlite, but I do suspect however, that this might be another example of the tcp port exhaustion issue as I see that you're using MySQL.

You will need to set MAX_OPEN_CONNS, MAX_IDLE_CONNS and CONN_MAX_LIFETIME. In particular you should set MAX_OPEN_CONNS = MAX_IDLE_CONNS.

@zeripath commented on GitHub (Jan 4, 2020): Hi peeps! So I finally got round to looking at this. First of all I can't replicate this on master on sqlite, but I do suspect however, that this might be another example of the tcp port exhaustion issue as I see that you're using MySQL. You will need to set MAX_OPEN_CONNS, MAX_IDLE_CONNS and CONN_MAX_LIFETIME. In particular you should set MAX_OPEN_CONNS = MAX_IDLE_CONNS.
Author
Owner

@everflux commented on GitHub (Jan 4, 2020):

For me it seems that the issue was fixed by forcing the workdir permissions on all files to be reset. It was not the sql lite but the git repository itself it seems.

@everflux commented on GitHub (Jan 4, 2020): For me it seems that the issue was fixed by forcing the workdir permissions on all files to be reset. It was not the sql lite but the git repository itself it seems.
Author
Owner

@zeripath commented on GitHub (Jan 4, 2020):

@everflux if the permissions on the git repository were messed up I could imagine that git would fail in this way.

@zeripath commented on GitHub (Jan 4, 2020): @everflux if the permissions on the git repository were messed up I could imagine that git would fail in this way.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#4324