LFS: Cloning objects / batch not found #4008

Closed
opened 2025-11-02 05:33:53 -06:00 by GiteaMirror · 49 comments
Owner

Originally created by @gabyx on GitHub (Sep 24, 2019).

  • Gitea version: Bug contained in 1.8.3 - 1.9.3
  • Git version: 2.23.0 (local)
  • Operating system: Gitea (Linux, docker), Pushing from repo from: Windows
  • Database (use [x]):
    • PostgreSQL
    • MySQL
    • MSSQL
    • SQLite
  • Can you reproduce the bug at https://try.gitea.io:
    • Yes (provide example URL)
    • No
    • Not relevant

Description

When I upload a repo with LFS objects, the upload mostly works.
While cloning, after some time, the lfs smudge filter (here 58%)
stalls always after some time, saying

image

After a night of debugging (updating sucessively through all versions with docker),
we come to the conclusions that

  • this issue arises in all versions from 1.8.3 till 1.9.3.
  • Version 1.7.4 - 1.8.2 all work correctly.
  • Setting the repository to private or public did not help (version 1.8.3)

Could it be that the following Submissions into 1.8.3 are problematic:

  • Always set userID on LFS authentication (#7224) (Part of #6993)
  • Fix LFS Locks over SSH (#6999) (#7223)

The hints/workarounds in the discussion below, did not solve this issue:
https://discourse.gitea.io/t/solved-git-lfs-upload-repeats-infinitely/635/2

Hopefully this gets some attention, since its a nasty LFS Bug which made us almost to apple crumble. 🍎

Originally created by @gabyx on GitHub (Sep 24, 2019). - Gitea version: Bug contained in 1.8.3 - 1.9.3 - Git version: 2.23.0 (local) - Operating system: Gitea (Linux, docker), Pushing from repo from: Windows - Database (use `[x]`): - [ ] PostgreSQL - [ ] MySQL - [x] MSSQL - [ ] SQLite - Can you reproduce the bug at https://try.gitea.io: - [x] Yes (provide example URL) - [ ] No - [ ] Not relevant ## Description When I upload a repo with LFS objects, the upload mostly works. While cloning, after some time, the lfs smudge filter (here 58%) **stalls** always after some time, saying ![image](https://user-images.githubusercontent.com/647437/65674839-803b8a00-e04d-11e9-92c4-a4f791869884.png) After a night of debugging (updating sucessively through all versions with docker), we come to the conclusions that - this issue arises in all versions from **1.8.3** till **1.9.3**. - Version **1.7.4 - 1.8.2** all work correctly. - Setting the repository to private or public did not help (version 1.8.3) Could it be that the following Submissions into 1.8.3 are problematic: - Always set userID on LFS authentication (#7224) (Part of #6993) - Fix LFS Locks over SSH (#6999) (#7223) The hints/workarounds in the discussion below, did not solve this issue: https://discourse.gitea.io/t/solved-git-lfs-upload-repeats-infinitely/635/2 Hopefully this gets some attention, since its a nasty LFS Bug which made us almost to apple crumble. 🍎
GiteaMirror added the type/bug label 2025-11-02 05:33:53 -06:00
Author
Owner

@m-a-v commented on GitHub (Sep 27, 2019):

I've made some more tests. After compiling the version of commit dbd0a2e Fix LFS Locks over SSH (#6999) (#7223) the error appears. The LFS data is large (approximately 10 GB). One commit before (7697a28) everthing works perfectly.

I've tried to disable the SSH server. But this doesn't change anything.

@zeripath Let me know if you need more information.

@m-a-v commented on GitHub (Sep 27, 2019): I've made some more tests. After compiling the version of commit dbd0a2e **Fix LFS Locks over SSH (#6999) (#7223)** the error appears. The LFS data is large (approximately 10 GB). One commit before (7697a28) everthing works perfectly. I've tried to disable the SSH server. But this doesn't change anything. @zeripath Let me know if you need more information.
Author
Owner

@m-a-v commented on GitHub (Sep 27, 2019):

Here you can see the debug log output when the error occurs: PANIC:: runtime error: invalid memory address or nil pointer dereference,

2019/09/27 20:44:19 [D] Could not find repository: company/repository - dial tcp 172.18.0.6:3306: connect: cannot assign requested address, 2019/09/27 20:44:19 [D] LFS request - Method: GET, URL: /company/repository.git/info/lfs/objects/063e23a8631392cc939b6b609df91e02d064f3fe279522c3eefeb1c5f1d738a3, Status 404, 2019/09/27 20:44:19 [...les/context/panic.go:36 1()] [E] PANIC:: runtime error: invalid memory address or nil pointer dereference, /usr/local/go/src/runtime/panic.go:82 (0x44abc0), /usr/local/go/src/runtime/signal_unix.go:390 (0x44a9ef), /go/src/code.gitea.io/gitea/models/repo_permission.go:120 (0x108a0ed), /go/src/code.gitea.io/gitea/models/repo_permission.go:120 (0x108a0ed), /go/src/code.gitea.io/gitea/models/repo_permission.go:95 (0x1183338), /go/src/code.gitea.io/gitea/modules/lfs/server.go:501 (0x118330a), /go/src/code.gitea.io/gitea/modules/lfs/server.go:128 (0x117f2dd), /go/src/code.gitea.io/gitea/modules/lfs/server.go:146 (0x117f468), /go/src/code.gitea.io/gitea/modules/lfs/server.go:105 (0x117ef90), /usr/local/go/src/reflect/value.go:447 (0x4cb930), /usr/local/go/src/reflect/value.go:308 (0x4cb3b3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:177 (0x9a1466), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:137 (0x9a0d5b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x11667e8), /go/src/code.gitea.io/gitea/modules/context/panic.go:40 (0x11667db), /usr/local/go/src/reflect/value.go:447 (0x4cb930), /usr/local/go/src/reflect/value.go:308 (0x4cb3b3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:177 (0x9a1466), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:137 (0x9a0d5b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9efe76), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/session/session.go:192 (0x9efe61), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:79 (0x9cfdc0), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9e197f), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/recovery.go:161 (0x9e196d), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:40 (0x9d3bb3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9e0ca0), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:52 (0x9e0c8b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:40 (0x9d3bb3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:187 (0x9e2bc6), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:303 (0x9dc635), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/macaron.go:220 (0x9d4f8c), /go/src/code.gitea.io/gitea/vendor/github.com/gorilla/context/context.go:141 (0xce374a), /usr/local/go/src/net/http/server.go:1995 (0x6f63a3), /usr/local/go/src/net/http/server.go:2774 (0x6f9677), /usr/local/go/src/net/http/server.go:1878 (0x6f5360), /usr/local/go/src/runtime/asm_amd64.s:1337 (0x464c20), , 2019/09/27 20:44:19 [D] Template: status/500, 2019/09/27 20:44:19 [...les/context/panic.go:36 1()] [E] PANIC:: runtime error: invalid memory address or nil pointer dereference, /usr/local/go/src/runtime/panic.go:82 (0x44abc0), /usr/local/go/src/runtime/signal_unix.go:390 (0x44a9ef), /go/src/code.gitea.io/gitea/models/repo_permission.go:120 (0x108a0ed), /go/src/code.gitea.io/gitea/models/repo_permission.go:120 (0x108a0ed), /go/src/code.gitea.io/gitea/models/repo_permission.go:95 (0x1183338), /go/src/code.gitea.io/gitea/modules/lfs/server.go:501 (0x118330a), /go/src/code.gitea.io/gitea/modules/lfs/server.go:128 (0x117f2dd), /go/src/code.gitea.io/gitea/modules/lfs/server.go:146 (0x117f468), /go/src/code.gitea.io/gitea/modules/lfs/server.go:105 (0x117ef90), /usr/local/go/src/reflect/value.go:447 (0x4cb930), /usr/local/go/src/reflect/value.go:308 (0x4cb3b3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:177 (0x9a1466), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:137 (0x9a0d5b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x11667e8), /go/src/code.gitea.io/gitea/modules/context/panic.go:40 (0x11667db), /usr/local/go/src/reflect/value.go:447 (0x4cb930), /usr/local/go/src/reflect/value.go:308 (0x4cb3b3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:177 (0x9a1466), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:137 (0x9a0d5b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9efe76), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/session/session.go:192 (0x9efe61), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:79 (0x9cfdc0), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9e197f), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/recovery.go:161 (0x9e196d), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:40 (0x9d3bb3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9e0ca0), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:52 (0x9e0c8b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:40 (0x9d3bb3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:187 (0x9e2bc6), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:303 (0x9dc635), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/macaron.go:220 (0x9d4f8c), /go/src/code.gitea.io/gitea/vendor/github.com/gorilla/context/context.go:141 (0xce374a), /usr/local/go/src/net/http/server.go:1995 (0x6f63a3), /usr/local/go/src/net/http/server.go:2774 (0x6f9677), /usr/local/go/src/net/http/server.go:1878 (0x6f5360), /usr/local/go/src/runtime/asm_amd64.s:1337 (0x464c20),

@m-a-v commented on GitHub (Sep 27, 2019): Here you can see the debug log output when the error occurs: **PANIC:: runtime error: invalid memory address or nil pointer dereference,** `2019/09/27 20:44:19 [D] Could not find repository: company/repository - dial tcp 172.18.0.6:3306: connect: cannot assign requested address, 2019/09/27 20:44:19 [D] LFS request - Method: GET, URL: /company/repository.git/info/lfs/objects/063e23a8631392cc939b6b609df91e02d064f3fe279522c3eefeb1c5f1d738a3, Status 404, 2019/09/27 20:44:19 [...les/context/panic.go:36 1()] [E] PANIC:: runtime error: invalid memory address or nil pointer dereference, /usr/local/go/src/runtime/panic.go:82 (0x44abc0), /usr/local/go/src/runtime/signal_unix.go:390 (0x44a9ef), /go/src/code.gitea.io/gitea/models/repo_permission.go:120 (0x108a0ed), /go/src/code.gitea.io/gitea/models/repo_permission.go:120 (0x108a0ed), /go/src/code.gitea.io/gitea/models/repo_permission.go:95 (0x1183338), /go/src/code.gitea.io/gitea/modules/lfs/server.go:501 (0x118330a), /go/src/code.gitea.io/gitea/modules/lfs/server.go:128 (0x117f2dd), /go/src/code.gitea.io/gitea/modules/lfs/server.go:146 (0x117f468), /go/src/code.gitea.io/gitea/modules/lfs/server.go:105 (0x117ef90), /usr/local/go/src/reflect/value.go:447 (0x4cb930), /usr/local/go/src/reflect/value.go:308 (0x4cb3b3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:177 (0x9a1466), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:137 (0x9a0d5b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x11667e8), /go/src/code.gitea.io/gitea/modules/context/panic.go:40 (0x11667db), /usr/local/go/src/reflect/value.go:447 (0x4cb930), /usr/local/go/src/reflect/value.go:308 (0x4cb3b3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:177 (0x9a1466), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:137 (0x9a0d5b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9efe76), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/session/session.go:192 (0x9efe61), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:79 (0x9cfdc0), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9e197f), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/recovery.go:161 (0x9e196d), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:40 (0x9d3bb3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9e0ca0), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:52 (0x9e0c8b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:40 (0x9d3bb3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:187 (0x9e2bc6), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:303 (0x9dc635), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/macaron.go:220 (0x9d4f8c), /go/src/code.gitea.io/gitea/vendor/github.com/gorilla/context/context.go:141 (0xce374a), /usr/local/go/src/net/http/server.go:1995 (0x6f63a3), /usr/local/go/src/net/http/server.go:2774 (0x6f9677), /usr/local/go/src/net/http/server.go:1878 (0x6f5360), /usr/local/go/src/runtime/asm_amd64.s:1337 (0x464c20), , 2019/09/27 20:44:19 [D] Template: status/500, 2019/09/27 20:44:19 [...les/context/panic.go:36 1()] [E] PANIC:: runtime error: invalid memory address or nil pointer dereference, /usr/local/go/src/runtime/panic.go:82 (0x44abc0), /usr/local/go/src/runtime/signal_unix.go:390 (0x44a9ef), /go/src/code.gitea.io/gitea/models/repo_permission.go:120 (0x108a0ed), /go/src/code.gitea.io/gitea/models/repo_permission.go:120 (0x108a0ed), /go/src/code.gitea.io/gitea/models/repo_permission.go:95 (0x1183338), /go/src/code.gitea.io/gitea/modules/lfs/server.go:501 (0x118330a), /go/src/code.gitea.io/gitea/modules/lfs/server.go:128 (0x117f2dd), /go/src/code.gitea.io/gitea/modules/lfs/server.go:146 (0x117f468), /go/src/code.gitea.io/gitea/modules/lfs/server.go:105 (0x117ef90), /usr/local/go/src/reflect/value.go:447 (0x4cb930), /usr/local/go/src/reflect/value.go:308 (0x4cb3b3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:177 (0x9a1466), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:137 (0x9a0d5b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x11667e8), /go/src/code.gitea.io/gitea/modules/context/panic.go:40 (0x11667db), /usr/local/go/src/reflect/value.go:447 (0x4cb930), /usr/local/go/src/reflect/value.go:308 (0x4cb3b3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:177 (0x9a1466), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:137 (0x9a0d5b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9efe76), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/session/session.go:192 (0x9efe61), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:79 (0x9cfdc0), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9e197f), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/recovery.go:161 (0x9e196d), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:40 (0x9d3bb3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9e0ca0), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:52 (0x9e0c8b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:40 (0x9d3bb3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:187 (0x9e2bc6), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:303 (0x9dc635), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/macaron.go:220 (0x9d4f8c), /go/src/code.gitea.io/gitea/vendor/github.com/gorilla/context/context.go:141 (0xce374a), /usr/local/go/src/net/http/server.go:1995 (0x6f63a3), /usr/local/go/src/net/http/server.go:2774 (0x6f9677), /usr/local/go/src/net/http/server.go:1878 (0x6f5360), /usr/local/go/src/runtime/asm_amd64.s:1337 (0x464c20),`
Author
Owner

@m-a-v commented on GitHub (Sep 28, 2019):

I suppose that Gitea is exceeding the number of local socket connections permitted by the OS.

Failure: cannot assign requested address

See also explanation and possible solution here:
https://github.com/golang/go/issues/16012#issuecomment-224948823

Where could I change the setting MaxIdleConnsPerHost and other LFS server settings to make further tests?

@m-a-v commented on GitHub (Sep 28, 2019): I suppose that Gitea is exceeding the number of local socket connections permitted by the OS. Failure: **cannot assign requested address** See also explanation and possible solution here: https://github.com/golang/go/issues/16012#issuecomment-224948823 Where could I change the setting MaxIdleConnsPerHost and other LFS server settings to make further tests?
Author
Owner

@m-a-v commented on GitHub (Sep 28, 2019):

BTW: The error PANIC:: runtime error: invalid memory address or nil pointer dereference does not always appear in the log output. Sometimes the server and client just hang.

@m-a-v commented on GitHub (Sep 28, 2019): BTW: The error **PANIC:: runtime error: invalid memory address or nil pointer dereference** does not always appear in the log output. Sometimes the server and client just hang.
Author
Owner

@m-a-v commented on GitHub (Sep 28, 2019):

@lunny Who could help to isolate this bug? Is there any Gitea programmer who could support us? I am willing to make more tests but I need some hints.

@m-a-v commented on GitHub (Sep 28, 2019): @lunny Who could help to isolate this bug? Is there any Gitea programmer who could support us? I am willing to make more tests but I need some hints.
Author
Owner

@gabyx commented on GitHub (Sep 29, 2019):

@m-a-v: There is also a setting:

git -c lfs.concurrenttransfers=5 clone

which will affect the transfer probably, nevertheless it should not crash the server...

@gabyx commented on GitHub (Sep 29, 2019): @m-a-v: There is also a setting: ``` git -c lfs.concurrenttransfers=5 clone ``` which will affect the transfer probably, nevertheless it should not crash the server...
Author
Owner

@gabyx commented on GitHub (Sep 29, 2019):

Another interesting read: https://www.fromdual.com/huge-amount-of-time-wait-connections

  • Check ulimit, maxfiles, and somaxconn. Possibly system runs out of limits resources. Link
@gabyx commented on GitHub (Sep 29, 2019): Another interesting read: https://www.fromdual.com/huge-amount-of-time-wait-connections - Check ulimit, maxfiles, and somaxconn. Possibly system runs out of limits resources. [Link](https://stackoverflow.com/questions/47385692/limited-concurrent-connections-in-go)
Author
Owner

@lunny commented on GitHub (Sep 30, 2019):

@m-a-v I think @zeripath maybe. But if not, I can take a look at this.

@lunny commented on GitHub (Sep 30, 2019): @m-a-v I think @zeripath maybe. But if not, I can take a look at this.
Author
Owner

@m-a-v commented on GitHub (Sep 30, 2019):

The problem seems to be the huge amount of connections for the Get request (more than 10k connections for one single client!). See also here:

https://medium.com/@valyala/net-http-client-has-the-following-additional-limitations-318ac870ce9d.
https://medium.com/@nate510/don-t-use-go-s-default-http-client-4804cb19f779

@m-a-v commented on GitHub (Sep 30, 2019): The problem seems to be the huge amount of connections for the Get request (more than 10k connections for one single client!). See also here: https://medium.com/@valyala/net-http-client-has-the-following-additional-limitations-318ac870ce9d. https://medium.com/@nate510/don-t-use-go-s-default-http-client-4804cb19f779
Author
Owner

@zeripath commented on GitHub (Oct 10, 2019):

@m-a-v I've been very busy doing other things for a while so have been away from Gitea. I'll take a look at this.

I think you're on the right trail with the number of connections thing. IIRC there's another person who had a similar issue.

@zeripath commented on GitHub (Oct 10, 2019): @m-a-v I've been very busy doing other things for a while so have been away from Gitea. I'll take a look at this. I think you're on the right trail with the number of connections thing. IIRC there's another person who had a similar issue.
Author
Owner

@zeripath commented on GitHub (Oct 10, 2019):

@m-a-v I can't understand why dbd0a2e should break things, but I'll double check.

Maybe it's possible the request body isn't being closed or something stupid like that. That would cause a leak if so and could explain the issue.

The other possiblity is that dbd0a2e has nothing to do with things and it's a Heisenbug relating to the number of connections thing.

@zeripath commented on GitHub (Oct 10, 2019): @m-a-v I can't understand why dbd0a2e should break things, but I'll double check. Maybe it's possible the request body isn't being closed or something stupid like that. That would cause a leak if so and could explain the issue. The other possiblity is that dbd0a2e has nothing to do with things and it's a Heisenbug relating to the number of connections thing.
Author
Owner

@guillep2k commented on GitHub (Oct 10, 2019):

A netstat -an could be usefull to see in what state are the connections when this happens. It doesn't need to make Gitea fail, but it will be useful as long as there is a large number of connections listed. It's not the same if the connections are in CONNECTED state, or CLOSE_WAIT, FIN_WAIT1, etc.

@guillep2k commented on GitHub (Oct 10, 2019): A `netstat -an` could be usefull to see in what state are the connections when this happens. It doesn't need to make Gitea fail, but it will be useful as long as there is a large number of connections listed. It's not the same if the connections are in CONNECTED state, or CLOSE_WAIT, FIN_WAIT1, etc.
Author
Owner

@zeripath commented on GitHub (Oct 10, 2019):

OK, so all these calls to ReadCloser() don't Close():

57b0d9a38b/modules/lfs/server.go (L330)

57b0d9a38b/modules/lfs/server.go (L437)

57b0d9a38b/modules/lfs/server.go (L456)

Whether that's the cause of your bug is another question - however, it would fit with dbd0a2e causing more issues because suddenly you get a lot more calls to unpack.

These should be closed so I guess that's at least a starting point for attempting to fix this. (If I find anything else I will update this.)

@zeripath commented on GitHub (Oct 10, 2019): OK, so all these calls to ReadCloser() don't Close(): https://github.com/go-gitea/gitea/blob/57b0d9a38ba7d8dcc05a74fe39ab9f9e765ed8b3/modules/lfs/server.go#L330 https://github.com/go-gitea/gitea/blob/57b0d9a38ba7d8dcc05a74fe39ab9f9e765ed8b3/modules/lfs/server.go#L437 https://github.com/go-gitea/gitea/blob/57b0d9a38ba7d8dcc05a74fe39ab9f9e765ed8b3/modules/lfs/server.go#L456 Whether that's the cause of your bug is another question - however, it would fit with dbd0a2e causing more issues because suddenly you get a lot more calls to unpack. These should be closed so I guess that's at least a starting point for attempting to fix this. (If I find anything else I will update this.)
Author
Owner

@zeripath commented on GitHub (Oct 10, 2019):

@m-a-v would you be able to rebuild from my PR #8454 and see if that solves your issue?

@zeripath commented on GitHub (Oct 10, 2019): @m-a-v would you be able to rebuild from my PR #8454 and see if that solves your issue?
Author
Owner

@m-a-v commented on GitHub (Oct 11, 2019):

@zeripath Thanks a lot. It may take some time until I can test it, but I certainly will.

@m-a-v commented on GitHub (Oct 11, 2019): @zeripath Thanks a lot. It may take some time until I can test it, but I certainly will.
Author
Owner

@zeripath commented on GitHub (Oct 12, 2019):

It's actually been merged in to 1.10 and 1.9 branches already.

@zeripath commented on GitHub (Oct 12, 2019): It's actually been merged in to 1.10 and 1.9 branches already.
Author
Owner

@m-a-v commented on GitHub (Oct 15, 2019):

I've tested it again with 1.10 and it seems that the described LFS bug has been solved or at least it made the error appear for this specific scenario. Before @zeropath fix we had more than 10k connections in a TIME_WAIT state. Now there are still approximately 3.5k connections in the TIME_WAIT state. I assume if multiple clients will access the LFS server the same problem could still occur.

Any idea how to improve this? Are there other possible leaks? I assume that a connection which closes will not remain in a TIME_WAIT state. Can anyone confirm this?

@m-a-v commented on GitHub (Oct 15, 2019): I've tested it again with 1.10 and it seems that the described LFS bug has been solved or at least it made the error appear for this specific scenario. Before @zeropath fix we had more than 10k connections in a TIME_WAIT state. Now there are still approximately 3.5k connections in the TIME_WAIT state. I assume if multiple clients will access the LFS server the same problem could still occur. Any idea how to improve this? Are there other possible leaks? I assume that a connection which closes will not remain in a TIME_WAIT state. Can anyone confirm this?
Author
Owner

@zeripath commented on GitHub (Oct 15, 2019):

Hi @m-a-v, I guess this means that I must have missed some others. Is there anyway of checking that they're all LFS connections?

@zeripath commented on GitHub (Oct 15, 2019): Hi @m-a-v, I guess this means that I must have missed some others. Is there anyway of checking that they're all LFS connections?
Author
Owner

@m-a-v commented on GitHub (Oct 15, 2019):

Indirectly, yes. I had only one active client. Before LFS checkout I had two connections on the MariaDB database server instance. During LFS checkout about 3.5k connections and then some minutes later again 2 connections.

This article could be interesting:
http://www.serverframework.com/asynchronousevents/2011/01/time-wait-and-its-design-implications-for-protocols-and-scalable-servers.html

@m-a-v commented on GitHub (Oct 15, 2019): Indirectly, yes. I had only one active client. Before LFS checkout I had two connections on the MariaDB database server instance. During LFS checkout about 3.5k connections and then some minutes later again 2 connections. This article could be interesting: http://www.serverframework.com/asynchronousevents/2011/01/time-wait-and-its-design-implications-for-protocols-and-scalable-servers.html
Author
Owner

@zeripath commented on GitHub (Oct 15, 2019):

LFS checkout causes 3.5K connections?! How many LFS objects do you have?

@zeripath commented on GitHub (Oct 15, 2019): LFS checkout causes 3.5K connections?! How many LFS objects do you have?
Author
Owner

@m-a-v commented on GitHub (Oct 15, 2019):

12k LFS objects.

@m-a-v commented on GitHub (Oct 15, 2019): 12k LFS objects.
Author
Owner

@m-a-v commented on GitHub (Oct 15, 2019):

The error appeared again. I have to check this later. Probably next week.

@m-a-v commented on GitHub (Oct 15, 2019): The error appeared again. I have to check this later. Probably next week.
Author
Owner

@zeripath commented on GitHub (Oct 15, 2019):

So I've spotted another unclosed thing, which is unlikely to be causing your issue, however, I am suspicious that we're not closing the response body in modules/lfs/server.go.

@zeripath commented on GitHub (Oct 15, 2019): So I've spotted another unclosed thing, which is unlikely to be causing your issue, however, I am suspicious that we're not closing the response body in `modules/lfs/server.go`.
Author
Owner

@guillep2k commented on GitHub (Oct 15, 2019):

From What are CLOSE_WAIT and TIME_WAIT states?

TIME_WAIT indicates that local endpoint (this side) has closed the connection.

I think you may have a network configuration problem. TIME_WAIT lingering too much is a common problem for web servers; usually because the default timeout is too long. Search around because there are many documents dealing with this. Just a "first to show up in a search" pick:

@guillep2k commented on GitHub (Oct 15, 2019): From [What are CLOSE_WAIT and TIME_WAIT states?](https://superuser.com/questions/173535/what-are-close-wait-and-time-wait-states) > > TIME_WAIT indicates that local endpoint (this side) has closed the connection. > I think you may have a network configuration problem. `TIME_WAIT` lingering too much is a common problem for web servers; usually because the default timeout is too long. Search around because there are many documents dealing with this. Just a "first to show up in a search" pick: * [Huge amount of TIME_WAIT connections](https://www.fromdual.com/huge-amount-of-time-wait-connections) * [Avoid TIME_WAIT connections](https://serverfault.com/questions/478691/avoid-time-wait-connections) * [Troubleshoot port exhaustion issues](https://docs.microsoft.com/en-us/windows/client-management/troubleshoot-tcpip-port-exhaust)
Author
Owner

@guillep2k commented on GitHub (Oct 15, 2019):

@zeripath Any connections that Gitea leaves open should remain in either ESTABLISHED or CLOSE_WAIT states.

@guillep2k commented on GitHub (Oct 15, 2019): @zeripath Any connections that Gitea leaves open should remain in either `ESTABLISHED` or `CLOSE_WAIT` states.
Author
Owner

@zeripath commented on GitHub (Oct 15, 2019):

Could it be that git lfs on the client is also leading connections?

@zeripath commented on GitHub (Oct 15, 2019): Could it be that git lfs on the client is also leading connections?
Author
Owner

@guillep2k commented on GitHub (Oct 15, 2019):

Could it be that git lfs on the client is also leading connections?

That would be either FIN_WAIT_1 or FIN_WAIT_2.

TIME_WAIT is a state maintained by the OS to keep the port from being reused (by port I mean the client+server address & port pair).

@guillep2k commented on GitHub (Oct 15, 2019): > > > Could it be that git lfs on the client is also leading connections? That would be either `FIN_WAIT_1` or `FIN_WAIT_2`. `TIME_WAIT` is a state maintained by the OS to keep the port from being reused (by port I mean the client+server address & port pair).
Author
Owner

@guillep2k commented on GitHub (Oct 15, 2019):

This picture should help (but it's not easy to read, so I guess it doesn't):

image

@guillep2k commented on GitHub (Oct 15, 2019): This picture should help (but it's not easy to read, so I guess it doesn't): ![image](https://user-images.githubusercontent.com/18600385/66850795-e120f880-ef4f-11e9-8aa2-f92de039aa0c.png)
Author
Owner

@m-a-v commented on GitHub (Oct 15, 2019):

I think the problem is more the following:

"Your problem is that you are not reusing your MySQL connections within your app but instead you are creating a new connection every time you want to run an SQL query. This involves not only setting up a TCP connection, but then also passing authentication credentials across it. And this is happening for every query (or at least every front-end web request) and it's wasteful and time consuming."

I think this would also speed up Gitea's LFS server a lot.

source: https://serverfault.com/questions/478691/avoid-time-wait-connections

@m-a-v commented on GitHub (Oct 15, 2019): I think the problem is more the following: "Your problem is that you are not reusing your MySQL connections within your app but instead you are creating a new connection every time you want to run an SQL query. This involves not only setting up a TCP connection, but then also passing authentication credentials across it. And this is happening for every query (or at least every front-end web request) and it's wasteful and time consuming." I think this would also speed up Gitea's LFS server a lot. source: https://serverfault.com/questions/478691/avoid-time-wait-connections
Author
Owner

@zeripath commented on GitHub (Oct 15, 2019):

AHA! Excellent! Well done for finding that!

@zeripath commented on GitHub (Oct 15, 2019): AHA! Excellent! Well done for finding that!
Author
Owner

@zeripath commented on GitHub (Oct 15, 2019):

OK We do recycle connections. We use the underlying go sql connection pool.

For MySQL there are the following in the [database] part of the app.ini:

  • MAX_IDLE_CONNS 0: Max idle database connections on connnection pool, default is 0
  • CONN_MAX_LIFETIME 3s: Database connection max lifetime

https://docs.gitea.io/en-us/config-cheat-sheet/#database-database

I think MAX_IDLE_CONNECTIONS was set to 0 because MySQL doesn't like long lasting connections.

I will however make a PR, exposing SetConnMaxLifetime. Edit: I'm an idiot it's already exposed for MySQL.

@zeripath commented on GitHub (Oct 15, 2019): OK We do recycle connections. We use the underlying go sql connection pool. For MySQL there are the following in the `[database]` part of the app.ini: * `MAX_IDLE_CONNS` **0**: Max idle database connections on connnection pool, default is 0 * `CONN_MAX_LIFETIME` **3s**: Database connection max lifetime https://docs.gitea.io/en-us/config-cheat-sheet/#database-database I think `MAX_IDLE_CONNECTIONS` was set to 0 because MySQL doesn't like long lasting connections. ~~I will however make a PR, exposing SetConnMaxLifetime.~~ Edit: I'm an idiot it's already exposed for MySQL.
Author
Owner

@zeripath commented on GitHub (Oct 15, 2019):

I think what you need to do is tune those variables better. I think our defaults are highly likely to be incorrect - however, I think they were set to this because of other users complaining of problems.

I suspect that MAX_IDLE_CONNECTIONS being set to 0 happened before we adjusted CONN_MAX_LIFETIME and it could be that we could be more generous with both of these. I.e. something like MAX_IDLE_CONNECTIONS 10 and CONN_MAX_LIFETIME 15m would work.

@zeripath commented on GitHub (Oct 15, 2019): I think what you need to do is tune those variables better. I think our defaults are highly likely to be incorrect - however, I think they were set to this because of other users complaining of problems. I suspect that MAX_IDLE_CONNECTIONS being set to 0 happened before we adjusted CONN_MAX_LIFETIME and it could be that we could be more generous with both of these. I.e. something like MAX_IDLE_CONNECTIONS 10 and CONN_MAX_LIFETIME 15m would work.
Author
Owner

@m-a-v commented on GitHub (Oct 21, 2019):

I could test it again with the repo. Which branch should I take? Which parameters (I've seen that discussions continued)?

@m-a-v commented on GitHub (Oct 21, 2019): I could test it again with the repo. Which branch should I take? Which parameters (I've seen that discussions continued)?
Author
Owner

@m-a-v commented on GitHub (Oct 21, 2019):

So I've spotted another unclosed thing, which is unlikely to be causing your issue, however, I am suspicious that we're not closing the response body in modules/lfs/server.go.

Did you also fix this?

@m-a-v commented on GitHub (Oct 21, 2019): > So I've spotted another unclosed thing, which is unlikely to be causing your issue, however, I am suspicious that we're not closing the response body in `modules/lfs/server.go`. Did you also fix this?
Author
Owner

@m-a-v commented on GitHub (Oct 31, 2019):

I have made several experiments with the currently running gitea server(v1.7.4 and with the new version v.1.9.5). The netstat snapshots were created at the peak of the number of open connections.

Version 1.7.4

root@917128b828cb:/# netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n
      1 Foreign
      1 established)
      2 ESTABLISHED
      2 LISTEN
    162 TIME_WAIT

Version 1.9.5 (and same default settings as with 1.7.4

bash-5.0# netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n
      1 ESTABLISHED
      1 Foreign
      1 established)
      5 LISTEN
  30064 TIME_WAIT

Version 1.9.5 (CONN_MAX_LIFETIME = 45s, MAX_IDLE_CONNS = 10, MAX_OPEN_CONNS = 10)

bash-5.0# netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n
      1 ESTABLISHED
      1 Foreign
      1 established)
      5 LISTEN
  31095 TIME_WAIT

With both configurations the LFS servers has much too many open connections. So I think we still have serious problems with large LFS repos.

$ git clone https://domain.org/repo.git test
Cloning into 'test'...
remote: Enumerating objects: 157392, done.
remote: Counting objects: 100% (157392/157392), done.
remote: Compressing objects: 100% (97424/97424), done.
remote: Total 157392 (delta 63574), reused 151365 (delta 57755)
Receiving objects: 100% (157392/157392), 6.99 GiB | 57.68 MiB/s, done.
Resolving deltas: 100% (63574/63574), done.
Updating files: 100% (99264/99264), done.
Filtering content:  53% (6594/12372), 4.13 GiB | 2.38 MiB/s

The clone process just freezes at a certain percentage (as soon as there are too many connections).

I think this bug should be reopened.

@m-a-v commented on GitHub (Oct 31, 2019): I have made several experiments with the currently running gitea server(v1.7.4 and with the new version v.1.9.5). The netstat snapshots were created at the peak of the number of open connections. **Version 1.7.4** ``` root@917128b828cb:/# netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n 1 Foreign 1 established) 2 ESTABLISHED 2 LISTEN 162 TIME_WAIT ``` **Version 1.9.5 (and same default settings as with 1.7.4** ``` bash-5.0# netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n 1 ESTABLISHED 1 Foreign 1 established) 5 LISTEN 30064 TIME_WAIT ``` **Version 1.9.5 (CONN_MAX_LIFETIME = 45s, MAX_IDLE_CONNS = 10, MAX_OPEN_CONNS = 10)** ``` bash-5.0# netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n 1 ESTABLISHED 1 Foreign 1 established) 5 LISTEN 31095 TIME_WAIT ``` With both configurations the LFS servers has much too many open connections. So I think we still have serious problems with large LFS repos. ``` $ git clone https://domain.org/repo.git test Cloning into 'test'... remote: Enumerating objects: 157392, done. remote: Counting objects: 100% (157392/157392), done. remote: Compressing objects: 100% (97424/97424), done. remote: Total 157392 (delta 63574), reused 151365 (delta 57755) Receiving objects: 100% (157392/157392), 6.99 GiB | 57.68 MiB/s, done. Resolving deltas: 100% (63574/63574), done. Updating files: 100% (99264/99264), done. Filtering content: 53% (6594/12372), 4.13 GiB | 2.38 MiB/s ``` The clone process just freezes at a certain percentage (as soon as there are too many connections). I think this bug should be reopened.
Author
Owner

@zeripath commented on GitHub (Oct 31, 2019):

#8528 was only backported to 1.10 as #8618 . It was not backported to 1.9.5.

Setting MAX_OPEN_CONNS won't have any effect on 1.9.5.

Please try on 1.10-rc2 or master.

@zeripath commented on GitHub (Oct 31, 2019): #8528 was only backported to 1.10 as #8618 . It was not backported to 1.9.5. Setting MAX_OPEN_CONNS won't have any effect on 1.9.5. Please try on 1.10-rc2 or master.
Author
Owner

@m-a-v commented on GitHub (Oct 31, 2019):

master (CONN_MAX_LIFETIME = 45s, MAX_IDLE_CONNS = 10, MAX_OPEN_CONNS = 10)

bash-5.0# netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n
      1 FIN_WAIT1
      1 Foreign
      1 established)
      5 ESTABLISHED
      5 LISTEN
   8041 TIME_WAIT

The checkout succeeds but still many used connections remain in TIME_WAIT status. If multiple clients would access the LFS server it could not handle it.

@m-a-v commented on GitHub (Oct 31, 2019): **master (CONN_MAX_LIFETIME = 45s, MAX_IDLE_CONNS = 10, MAX_OPEN_CONNS = 10)** ``` bash-5.0# netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n 1 FIN_WAIT1 1 Foreign 1 established) 5 ESTABLISHED 5 LISTEN 8041 TIME_WAIT ``` The checkout succeeds but still many used connections remain in TIME_WAIT status. If multiple clients would access the LFS server it could not handle it.
Author
Owner

@zeripath commented on GitHub (Oct 31, 2019):

Your max lifetime is probably too low, 45s seems aggressive.

Are you sure all of those connections are db connections? Lots of http connections will be made when dealing with lots of lfs objects. (There's probably some more efficiencies we can do.)

If they're all db then multiple users won't change it - you're likely at your max as it should be mathematically determinable:

Total Connections = open +idle + timewait

If max open=max idle:
Max C = O + W

dC/dt = dO/dt + dW/dt

max dO/dt = 0 (as it's fixed)

max dW/dT = max_o/max_l - W/max_tw

dC/dt is positive around C=0 therefore dC/dt=0 should represent max for positive C and thence maximize W.

max_W = max_tw * max_o / max_l

If they're all db then you have a very long max tw or I've messed up in my maths somewhere.

You can set your time_wait at a server network stack level.

@zeripath commented on GitHub (Oct 31, 2019): Your max lifetime is probably too low, 45s seems aggressive. Are you sure all of those connections are db connections? Lots of http connections will be made when dealing with lots of lfs objects. (There's probably some more efficiencies we can do.) If they're all db then multiple users won't change it - you're likely at your max as it should be mathematically determinable: Total Connections = open +idle + timewait If max open=max idle: Max C = O + W dC/dt = dO/dt + dW/dt max dO/dt = 0 (as it's fixed) max dW/dT = max_o/max_l - W/max_tw dC/dt is positive around C=0 therefore dC/dt=0 should represent max for positive C and thence maximize W. max_W = max_tw * max_o / max_l If they're all db then you have a very long max tw or I've messed up in my maths somewhere. You can set your time_wait at a server network stack level.
Author
Owner

@m-a-v commented on GitHub (Oct 31, 2019):

I've chosen the 45 seconds from the discussion between you and @guillep2k in #8528.

How are the connections reused? Where is this made in the code? I assume after a connection is closed it will go in the TIME_WAIT state.

I don't know if all are db connections. Why did it work with 1.7.4 almost perfectly (see above)?

@m-a-v commented on GitHub (Oct 31, 2019): I've chosen the 45 seconds from the discussion between you and @guillep2k in #8528. How are the connections reused? Where is this made in the code? I assume after a connection is closed it will go in the TIME_WAIT state. I don't know if all are db connections. Why did it work with 1.7.4 almost perfectly (see above)?
Author
Owner

@m-a-v commented on GitHub (Oct 31, 2019):

This could be interesting:
https://stackoverflow.com/questions/1931043/avoiding-time-wait

"Probably the best option, if it's doable: refactor your protocol so that connections that are finished aren't closed, but go into an "idle" state so they can be re-used later, instead of opening up a new connection (like HTTP keep-alive)."

"Setting SO_REUSEADDR on the client side doesn't help the server side unless it also sets SO_REUSEADDR"

@m-a-v commented on GitHub (Oct 31, 2019): This could be interesting: https://stackoverflow.com/questions/1931043/avoiding-time-wait "Probably the best option, if it's doable: refactor your protocol so that connections that are finished aren't closed, but go into an "idle" state so they can be re-used later, instead of opening up a new connection (like HTTP keep-alive)." "Setting SO_REUSEADDR on the client side doesn't help the server side unless it also sets SO_REUSEADDR"
Author
Owner

@guillep2k commented on GitHub (Oct 31, 2019):

@zeripath @m-a-v It must be noticed that not all TIME_WAIT connections are from the database. Internal requests (e.g. internal router) and many others will create quick http connections that may or may not be reused.

@m-a-v it would be cool if you'd break your statistics down by listening port number.

@guillep2k commented on GitHub (Oct 31, 2019): @zeripath @m-a-v It must be noticed that not all `TIME_WAIT` connections are from the database. Internal requests (e.g. `internal` router) and many others will create quick http connections that may or may not be reused. @m-a-v it would be cool if you'd break your statistics down by listening port number.
Author
Owner

@guillep2k commented on GitHub (Oct 31, 2019):

"Probably the best option, if it's doable: refactor your protocol so that connections that are finished aren't closed, but go into an "idle" state so they can be re-used later, instead of opening up a new connection (like HTTP keep-alive)."

"Setting SO_REUSEADDR on the client side doesn't help the server side unless it also sets SO_REUSEADDR"

I don't think SO_REUSEADDR applies here. If you're down into this level of optimization, I'd suggest tuning the tcp_fin_timeout parameter in the kernel. Too short a value will have ill side effects, though; I'd wouldn't set it below 30 seconds.

But TIME_WAIT is actually the symptom, not the problem.

@guillep2k commented on GitHub (Oct 31, 2019): > > "Probably the best option, if it's doable: refactor your protocol so that connections that are finished aren't closed, but go into an "idle" state so they can be re-used later, instead of opening up a new connection (like HTTP keep-alive)." > > "Setting SO_REUSEADDR on the client side doesn't help the server side unless it also sets SO_REUSEADDR" I don't think `SO_REUSEADDR` applies here. If you're down into this level of optimization, I'd suggest [tuning the `tcp_fin_timeout`](https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c02226086) parameter in the kernel. Too short a value will have ill side effects, though; I'd wouldn't set it below 30 seconds. But `TIME_WAIT` is actually the symptom, not the problem.
Author
Owner

@m-a-v commented on GitHub (Nov 1, 2019):

@guillep2k What do you exactly mean with "it would be cool if you'd break your statistics down by listening port number"?

tcp_fin_timeout is set to 60 seconds on my system. Ubuntu 18.04 LTS standard configuration.

The question still remains. Why did it work perfectly with 1.7.4 (and earlier) and now anymore?

@m-a-v commented on GitHub (Nov 1, 2019): @guillep2k What do you exactly mean with "it would be cool if you'd break your statistics down by listening port number"? tcp_fin_timeout is set to 60 seconds on my system. Ubuntu 18.04 LTS standard configuration. The question still remains. Why did it work perfectly with 1.7.4 (and earlier) and now anymore?
Author
Owner

@guillep2k commented on GitHub (Nov 1, 2019):

@m-a-v

# netstat -ant | grep TIME_WAIT | awk '{print $5 " " $6}' | cut -d: -f2 | sort | uniq -c | sort -n
@guillep2k commented on GitHub (Nov 1, 2019): @m-a-v ``` # netstat -ant | grep TIME_WAIT | awk '{print $5 " " $6}' | cut -d: -f2 | sort | uniq -c | sort -n ```
Author
Owner

@guillep2k commented on GitHub (Nov 1, 2019):

The question still remains. Why did it work perfectly with 1.7.4 (and earlier) and now anymore?

I don't know, I'd need to check the code. The important thing is that it's taken care of now. 😁

@guillep2k commented on GitHub (Nov 1, 2019): > > The question still remains. Why did it work perfectly with 1.7.4 (and earlier) and now anymore? I don't know, I'd need to check the code. The important thing is that it's taken care of now. 😁
Author
Owner

@m-a-v commented on GitHub (Nov 1, 2019):

The question still remains. Why did it work perfectly with 1.7.4 (and earlier) and now anymore?

I don't know, I'd need to check the code. The important thing is that it's taken care of now. 😁

I meant "and now not anymore".

@m-a-v commented on GitHub (Nov 1, 2019): > > The question still remains. Why did it work perfectly with 1.7.4 (and earlier) and now anymore? > > I don't know, I'd need to check the code. The important thing is that it's taken care of now. 😁 I meant "and now **not** anymore".
Author
Owner

@guillep2k commented on GitHub (Nov 1, 2019):

I meant "and now not anymore".

I meant it's now solved by properly handling CONN_MAX_LIFETIME, MAX_IDLE_CONNS and MAX_OPEN_CONNS.

@m-a-v If you want to investigate what's the specific change between 1.7.4 and 1.9.5 that caused this, I'd be interested in learning about your results.

@guillep2k commented on GitHub (Nov 1, 2019): > > I meant "and now **not** anymore". I meant it's now solved by properly handling `CONN_MAX_LIFETIME`, `MAX_IDLE_CONNS` and `MAX_OPEN_CONNS`. @m-a-v If you want to investigate what's the specific change between 1.7.4 and 1.9.5 that caused this, I'd be interested in learning about your results.
Author
Owner

@gabyx commented on GitHub (Dec 20, 2019):

on 1.7.4 (9f33aa6) I had lots of connections when cloning too on the peak, when Filtering... -> LFS Smudge:

$ netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n
      1 Foreign
      1 established)
      5 LISTEN
     10 ESTABLISHED
   8599 TIME_WAIT

When git lfs push --all origin at the peak

$ netstat -ant | grep TIME_WAIT | awk '{print $5 " " $6}' | cut -d: -f2 | sort | uniq -c
66

suddenly the client hangs on 97%. GIT_TRACE=true does not show anything... it just hangs... possible not related to gitea.

@gabyx commented on GitHub (Dec 20, 2019): **on 1.7.4 (9f33aa6)** I had lots of connections when cloning too on the peak, when `Filtering...` -> LFS Smudge: ```shell $ netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n 1 Foreign 1 established) 5 LISTEN 10 ESTABLISHED 8599 TIME_WAIT ``` When `git lfs push --all origin` at the peak ```shell $ netstat -ant | grep TIME_WAIT | awk '{print $5 " " $6}' | cut -d: -f2 | sort | uniq -c 66 ``` suddenly the client hangs on 97%. `GIT_TRACE=true` does not show anything... it just hangs... possible not related to gitea.
Author
Owner

@gabyx commented on GitHub (Jan 7, 2020):

on 1.11.0+dev-563-gbcac7cb93:
netstat -ant | grep TIME_WAIT | awk '{print $5 " " $6}' | cut -d: -f2 | sort | uniq -c
Peak is 280 connections in TIME_WAIT.

@gabyx commented on GitHub (Jan 7, 2020): **on 1.11.0+dev-563-gbcac7cb93**: ```netstat -ant | grep TIME_WAIT | awk '{print $5 " " $6}' | cut -d: -f2 | sort | uniq -c``` Peak is 280 connections in TIME_WAIT.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#4008