mirror of
https://github.com/go-gitea/gitea.git
synced 2026-03-12 02:24:21 -05:00
LFS: Cloning objects / batch not found #4008
Closed
opened 2025-11-02 05:33:53 -06:00 by GiteaMirror
·
49 comments
No Branch/Tag Specified
main
release/v1.25
release/v1.24
release/v1.23
release/v1.22
release/v1.21
release/v1.20
release/v1.19
release/v1.18
release/v1.17
release/v1.16
release/v1.15
release/v1.14
release/v1.13
release/v1.12
release/v1.11
release/v1.10
release/v1.9
release/v1.8
v1.25.3
v1.25.2
v1.25.1
v1.25.0
v1.24.7
v1.25.0-rc0
v1.26.0-dev
v1.24.6
v1.24.5
v1.24.4
v1.24.3
v1.24.2
v1.24.1
v1.24.0
v1.23.8
v1.24.0-rc0
v1.25.0-dev
v1.23.7
v1.23.6
v1.23.5
v1.23.4
v1.23.3
v1.23.2
v1.23.1
v1.23.0
v1.23.0-rc0
v1.24.0-dev
v1.22.6
v1.22.5
v1.22.4
v1.22.3
v1.22.2
v1.22.1
v1.22.0
v1.23.0-dev
v1.22.0-rc1
v1.21.11
v1.22.0-rc0
v1.21.10
v1.21.9
v1.21.8
v1.21.7
v1.21.6
v1.21.5
v1.21.4
v1.21.3
v1.21.2
v1.20.6
v1.21.1
v1.21.0
v1.21.0-rc2
v1.21.0-rc1
v1.20.5
v1.22.0-dev
v1.21.0-rc0
v1.20.4
v1.20.3
v1.20.2
v1.20.1
v1.20.0
v1.19.4
v1.21.0-dev
v1.20.0-rc2
v1.20.0-rc1
v1.20.0-rc0
v1.19.3
v1.19.2
v1.19.1
v1.19.0
v1.19.0-rc1
v1.20.0-dev
v1.19.0-rc0
v1.18.5
v1.18.4
v1.18.3
v1.18.2
v1.18.1
v1.18.0
v1.17.4
v1.18.0-rc1
v1.19.0-dev
v1.18.0-rc0
v1.17.3
v1.17.2
v1.17.1
v1.17.0
v1.17.0-rc2
v1.16.9
v1.17.0-rc1
v1.18.0-dev
v1.16.8
v1.16.7
v1.16.6
v1.16.5
v1.16.4
v1.16.3
v1.16.2
v1.16.1
v1.16.0
v1.15.11
v1.17.0-dev
v1.16.0-rc1
v1.15.10
v1.15.9
v1.15.8
v1.15.7
v1.15.6
v1.15.5
v1.15.4
v1.15.3
v1.15.2
v1.15.1
v1.14.7
v1.15.0
v1.15.0-rc3
v1.14.6
v1.15.0-rc2
v1.14.5
v1.16.0-dev
v1.15.0-rc1
v1.14.4
v1.14.3
v1.14.2
v1.14.1
v1.14.0
v1.13.7
v1.14.0-rc2
v1.13.6
v1.13.5
v1.14.0-rc1
v1.15.0-dev
v1.13.4
v1.13.3
v1.13.2
v1.13.1
v1.13.0
v1.12.6
v1.13.0-rc2
v1.14.0-dev
v1.13.0-rc1
v1.12.5
v1.12.4
v1.12.3
v1.12.2
v1.12.1
v1.11.8
v1.12.0
v1.11.7
v1.12.0-rc2
v1.11.6
v1.12.0-rc1
v1.13.0-dev
v1.11.5
v1.11.4
v1.11.3
v1.10.6
v1.12.0-dev
v1.11.2
v1.10.5
v1.11.1
v1.10.4
v1.11.0
v1.11.0-rc2
v1.10.3
v1.11.0-rc1
v1.10.2
v1.10.1
v1.10.0
v1.9.6
v1.9.5
v1.10.0-rc2
v1.11.0-dev
v1.10.0-rc1
v1.9.4
v1.9.3
v1.9.2
v1.9.1
v1.9.0
v1.9.0-rc2
v1.10.0-dev
v1.9.0-rc1
v1.8.3
v1.8.2
v1.8.1
v1.8.0
v1.8.0-rc3
v1.7.6
v1.8.0-rc2
v1.7.5
v1.8.0-rc1
v1.9.0-dev
v1.7.4
v1.7.3
v1.7.2
v1.7.1
v1.7.0
v1.7.0-rc3
v1.6.4
v1.7.0-rc2
v1.6.3
v1.7.0-rc1
v1.7.0-dev
v1.6.2
v1.6.1
v1.6.0
v1.6.0-rc2
v1.5.3
v1.6.0-rc1
v1.6.0-dev
v1.5.2
v1.5.1
v1.5.0
v1.5.0-rc2
v1.5.0-rc1
v1.5.0-dev
v1.4.3
v1.4.2
v1.4.1
v1.4.0
v1.4.0-rc3
v1.4.0-rc2
v1.3.3
v1.4.0-rc1
v1.3.2
v1.3.1
v1.3.0
v1.3.0-rc2
v1.3.0-rc1
v1.2.3
v1.2.2
v1.2.1
v1.2.0
v1.2.0-rc3
v1.2.0-rc2
v1.1.4
v1.2.0-rc1
v1.1.3
v1.1.2
v1.1.1
v1.1.0
v1.0.2
v1.0.1
v1.0.0
v0.9.99
Labels
Clear labels
$20
$250
$50
$500
backport/done
💎 Bounty
docs-update-needed
good first issue
hacktoberfest
issue/bounty
issue/confirmed
issue/critical
issue/duplicate
issue/needs-feedback
issue/not-a-bug
issue/regression
issue/stale
issue/workaround
lgtm/need 2
modifies/api
modifies/translation
outdated/backport/v1.18
outdated/theme/markdown
outdated/theme/timetracker
performance/bigrepo
performance/cpu
performance/memory
performance/speed
pr/breaking
proposal/accepted
proposal/rejected
pr/wip
pull-request
reviewed/wontfix
💰 Rewarded
skip-changelog
status/blocked
topic/accessibility
topic/api
topic/authentication
topic/build
topic/code-linting
topic/commit-signing
topic/content-rendering
topic/deployment
topic/distribution
topic/federation
topic/gitea-actions
topic/issues
topic/lfs
topic/mobile
topic/moderation
topic/packages
topic/pr
topic/projects
topic/repo
topic/repo-migration
topic/security
topic/theme
topic/ui
topic/ui-interaction
topic/ux
topic/webhooks
topic/wiki
type/bug
type/deprecation
type/docs
type/enhancement
type/feature
type/miscellaneous
type/proposal
type/question
type/refactoring
type/summary
type/testing
type/upstream
Mirrored from GitHub Pull Request
No Label
type/bug
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/gitea#4008
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @gabyx on GitHub (Sep 24, 2019).
[x]):Description
When I upload a repo with LFS objects, the upload mostly works.
While cloning, after some time, the lfs smudge filter (here 58%)
stalls always after some time, saying
After a night of debugging (updating sucessively through all versions with docker),
we come to the conclusions that
Could it be that the following Submissions into 1.8.3 are problematic:
The hints/workarounds in the discussion below, did not solve this issue:
https://discourse.gitea.io/t/solved-git-lfs-upload-repeats-infinitely/635/2
Hopefully this gets some attention, since its a nasty LFS Bug which made us almost to apple crumble. 🍎
@m-a-v commented on GitHub (Sep 27, 2019):
I've made some more tests. After compiling the version of commit
dbd0a2eFix LFS Locks over SSH (#6999) (#7223) the error appears. The LFS data is large (approximately 10 GB). One commit before (7697a28) everthing works perfectly.I've tried to disable the SSH server. But this doesn't change anything.
@zeripath Let me know if you need more information.
@m-a-v commented on GitHub (Sep 27, 2019):
Here you can see the debug log output when the error occurs: PANIC:: runtime error: invalid memory address or nil pointer dereference,
2019/09/27 20:44:19 [D] Could not find repository: company/repository - dial tcp 172.18.0.6:3306: connect: cannot assign requested address, 2019/09/27 20:44:19 [D] LFS request - Method: GET, URL: /company/repository.git/info/lfs/objects/063e23a8631392cc939b6b609df91e02d064f3fe279522c3eefeb1c5f1d738a3, Status 404, 2019/09/27 20:44:19 [...les/context/panic.go:36 1()] [E] PANIC:: runtime error: invalid memory address or nil pointer dereference, /usr/local/go/src/runtime/panic.go:82 (0x44abc0), /usr/local/go/src/runtime/signal_unix.go:390 (0x44a9ef), /go/src/code.gitea.io/gitea/models/repo_permission.go:120 (0x108a0ed), /go/src/code.gitea.io/gitea/models/repo_permission.go:120 (0x108a0ed), /go/src/code.gitea.io/gitea/models/repo_permission.go:95 (0x1183338), /go/src/code.gitea.io/gitea/modules/lfs/server.go:501 (0x118330a), /go/src/code.gitea.io/gitea/modules/lfs/server.go:128 (0x117f2dd), /go/src/code.gitea.io/gitea/modules/lfs/server.go:146 (0x117f468), /go/src/code.gitea.io/gitea/modules/lfs/server.go:105 (0x117ef90), /usr/local/go/src/reflect/value.go:447 (0x4cb930), /usr/local/go/src/reflect/value.go:308 (0x4cb3b3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:177 (0x9a1466), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:137 (0x9a0d5b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x11667e8), /go/src/code.gitea.io/gitea/modules/context/panic.go:40 (0x11667db), /usr/local/go/src/reflect/value.go:447 (0x4cb930), /usr/local/go/src/reflect/value.go:308 (0x4cb3b3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:177 (0x9a1466), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:137 (0x9a0d5b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9efe76), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/session/session.go:192 (0x9efe61), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:79 (0x9cfdc0), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9e197f), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/recovery.go:161 (0x9e196d), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:40 (0x9d3bb3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9e0ca0), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:52 (0x9e0c8b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:40 (0x9d3bb3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:187 (0x9e2bc6), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:303 (0x9dc635), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/macaron.go:220 (0x9d4f8c), /go/src/code.gitea.io/gitea/vendor/github.com/gorilla/context/context.go:141 (0xce374a), /usr/local/go/src/net/http/server.go:1995 (0x6f63a3), /usr/local/go/src/net/http/server.go:2774 (0x6f9677), /usr/local/go/src/net/http/server.go:1878 (0x6f5360), /usr/local/go/src/runtime/asm_amd64.s:1337 (0x464c20), , 2019/09/27 20:44:19 [D] Template: status/500, 2019/09/27 20:44:19 [...les/context/panic.go:36 1()] [E] PANIC:: runtime error: invalid memory address or nil pointer dereference, /usr/local/go/src/runtime/panic.go:82 (0x44abc0), /usr/local/go/src/runtime/signal_unix.go:390 (0x44a9ef), /go/src/code.gitea.io/gitea/models/repo_permission.go:120 (0x108a0ed), /go/src/code.gitea.io/gitea/models/repo_permission.go:120 (0x108a0ed), /go/src/code.gitea.io/gitea/models/repo_permission.go:95 (0x1183338), /go/src/code.gitea.io/gitea/modules/lfs/server.go:501 (0x118330a), /go/src/code.gitea.io/gitea/modules/lfs/server.go:128 (0x117f2dd), /go/src/code.gitea.io/gitea/modules/lfs/server.go:146 (0x117f468), /go/src/code.gitea.io/gitea/modules/lfs/server.go:105 (0x117ef90), /usr/local/go/src/reflect/value.go:447 (0x4cb930), /usr/local/go/src/reflect/value.go:308 (0x4cb3b3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:177 (0x9a1466), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:137 (0x9a0d5b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x11667e8), /go/src/code.gitea.io/gitea/modules/context/panic.go:40 (0x11667db), /usr/local/go/src/reflect/value.go:447 (0x4cb930), /usr/local/go/src/reflect/value.go:308 (0x4cb3b3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:177 (0x9a1466), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:137 (0x9a0d5b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9efe76), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/session/session.go:192 (0x9efe61), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:79 (0x9cfdc0), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9e197f), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/recovery.go:161 (0x9e196d), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:40 (0x9d3bb3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9e0ca0), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:52 (0x9e0c8b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:40 (0x9d3bb3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:187 (0x9e2bc6), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:303 (0x9dc635), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/macaron.go:220 (0x9d4f8c), /go/src/code.gitea.io/gitea/vendor/github.com/gorilla/context/context.go:141 (0xce374a), /usr/local/go/src/net/http/server.go:1995 (0x6f63a3), /usr/local/go/src/net/http/server.go:2774 (0x6f9677), /usr/local/go/src/net/http/server.go:1878 (0x6f5360), /usr/local/go/src/runtime/asm_amd64.s:1337 (0x464c20),@m-a-v commented on GitHub (Sep 28, 2019):
I suppose that Gitea is exceeding the number of local socket connections permitted by the OS.
Failure: cannot assign requested address
See also explanation and possible solution here:
https://github.com/golang/go/issues/16012#issuecomment-224948823
Where could I change the setting MaxIdleConnsPerHost and other LFS server settings to make further tests?
@m-a-v commented on GitHub (Sep 28, 2019):
BTW: The error PANIC:: runtime error: invalid memory address or nil pointer dereference does not always appear in the log output. Sometimes the server and client just hang.
@m-a-v commented on GitHub (Sep 28, 2019):
@lunny Who could help to isolate this bug? Is there any Gitea programmer who could support us? I am willing to make more tests but I need some hints.
@gabyx commented on GitHub (Sep 29, 2019):
@m-a-v: There is also a setting:
which will affect the transfer probably, nevertheless it should not crash the server...
@gabyx commented on GitHub (Sep 29, 2019):
Another interesting read: https://www.fromdual.com/huge-amount-of-time-wait-connections
@lunny commented on GitHub (Sep 30, 2019):
@m-a-v I think @zeripath maybe. But if not, I can take a look at this.
@m-a-v commented on GitHub (Sep 30, 2019):
The problem seems to be the huge amount of connections for the Get request (more than 10k connections for one single client!). See also here:
https://medium.com/@valyala/net-http-client-has-the-following-additional-limitations-318ac870ce9d.
https://medium.com/@nate510/don-t-use-go-s-default-http-client-4804cb19f779
@zeripath commented on GitHub (Oct 10, 2019):
@m-a-v I've been very busy doing other things for a while so have been away from Gitea. I'll take a look at this.
I think you're on the right trail with the number of connections thing. IIRC there's another person who had a similar issue.
@zeripath commented on GitHub (Oct 10, 2019):
@m-a-v I can't understand why
dbd0a2eshould break things, but I'll double check.Maybe it's possible the request body isn't being closed or something stupid like that. That would cause a leak if so and could explain the issue.
The other possiblity is that
dbd0a2ehas nothing to do with things and it's a Heisenbug relating to the number of connections thing.@guillep2k commented on GitHub (Oct 10, 2019):
A
netstat -ancould be usefull to see in what state are the connections when this happens. It doesn't need to make Gitea fail, but it will be useful as long as there is a large number of connections listed. It's not the same if the connections are in CONNECTED state, or CLOSE_WAIT, FIN_WAIT1, etc.@zeripath commented on GitHub (Oct 10, 2019):
OK, so all these calls to ReadCloser() don't Close():
57b0d9a38b/modules/lfs/server.go (L330)57b0d9a38b/modules/lfs/server.go (L437)57b0d9a38b/modules/lfs/server.go (L456)Whether that's the cause of your bug is another question - however, it would fit with
dbd0a2ecausing more issues because suddenly you get a lot more calls to unpack.These should be closed so I guess that's at least a starting point for attempting to fix this. (If I find anything else I will update this.)
@zeripath commented on GitHub (Oct 10, 2019):
@m-a-v would you be able to rebuild from my PR #8454 and see if that solves your issue?
@m-a-v commented on GitHub (Oct 11, 2019):
@zeripath Thanks a lot. It may take some time until I can test it, but I certainly will.
@zeripath commented on GitHub (Oct 12, 2019):
It's actually been merged in to 1.10 and 1.9 branches already.
@m-a-v commented on GitHub (Oct 15, 2019):
I've tested it again with 1.10 and it seems that the described LFS bug has been solved or at least it made the error appear for this specific scenario. Before @zeropath fix we had more than 10k connections in a TIME_WAIT state. Now there are still approximately 3.5k connections in the TIME_WAIT state. I assume if multiple clients will access the LFS server the same problem could still occur.
Any idea how to improve this? Are there other possible leaks? I assume that a connection which closes will not remain in a TIME_WAIT state. Can anyone confirm this?
@zeripath commented on GitHub (Oct 15, 2019):
Hi @m-a-v, I guess this means that I must have missed some others. Is there anyway of checking that they're all LFS connections?
@m-a-v commented on GitHub (Oct 15, 2019):
Indirectly, yes. I had only one active client. Before LFS checkout I had two connections on the MariaDB database server instance. During LFS checkout about 3.5k connections and then some minutes later again 2 connections.
This article could be interesting:
http://www.serverframework.com/asynchronousevents/2011/01/time-wait-and-its-design-implications-for-protocols-and-scalable-servers.html
@zeripath commented on GitHub (Oct 15, 2019):
LFS checkout causes 3.5K connections?! How many LFS objects do you have?
@m-a-v commented on GitHub (Oct 15, 2019):
12k LFS objects.
@m-a-v commented on GitHub (Oct 15, 2019):
The error appeared again. I have to check this later. Probably next week.
@zeripath commented on GitHub (Oct 15, 2019):
So I've spotted another unclosed thing, which is unlikely to be causing your issue, however, I am suspicious that we're not closing the response body in
modules/lfs/server.go.@guillep2k commented on GitHub (Oct 15, 2019):
From What are CLOSE_WAIT and TIME_WAIT states?
I think you may have a network configuration problem.
TIME_WAITlingering too much is a common problem for web servers; usually because the default timeout is too long. Search around because there are many documents dealing with this. Just a "first to show up in a search" pick:@guillep2k commented on GitHub (Oct 15, 2019):
@zeripath Any connections that Gitea leaves open should remain in either
ESTABLISHEDorCLOSE_WAITstates.@zeripath commented on GitHub (Oct 15, 2019):
Could it be that git lfs on the client is also leading connections?
@guillep2k commented on GitHub (Oct 15, 2019):
That would be either
FIN_WAIT_1orFIN_WAIT_2.TIME_WAITis a state maintained by the OS to keep the port from being reused (by port I mean the client+server address & port pair).@guillep2k commented on GitHub (Oct 15, 2019):
This picture should help (but it's not easy to read, so I guess it doesn't):
@m-a-v commented on GitHub (Oct 15, 2019):
I think the problem is more the following:
"Your problem is that you are not reusing your MySQL connections within your app but instead you are creating a new connection every time you want to run an SQL query. This involves not only setting up a TCP connection, but then also passing authentication credentials across it. And this is happening for every query (or at least every front-end web request) and it's wasteful and time consuming."
I think this would also speed up Gitea's LFS server a lot.
source: https://serverfault.com/questions/478691/avoid-time-wait-connections
@zeripath commented on GitHub (Oct 15, 2019):
AHA! Excellent! Well done for finding that!
@zeripath commented on GitHub (Oct 15, 2019):
OK We do recycle connections. We use the underlying go sql connection pool.
For MySQL there are the following in the
[database]part of the app.ini:MAX_IDLE_CONNS0: Max idle database connections on connnection pool, default is 0CONN_MAX_LIFETIME3s: Database connection max lifetimehttps://docs.gitea.io/en-us/config-cheat-sheet/#database-database
I think
MAX_IDLE_CONNECTIONSwas set to 0 because MySQL doesn't like long lasting connections.I will however make a PR, exposing SetConnMaxLifetime.Edit: I'm an idiot it's already exposed for MySQL.@zeripath commented on GitHub (Oct 15, 2019):
I think what you need to do is tune those variables better. I think our defaults are highly likely to be incorrect - however, I think they were set to this because of other users complaining of problems.
I suspect that MAX_IDLE_CONNECTIONS being set to 0 happened before we adjusted CONN_MAX_LIFETIME and it could be that we could be more generous with both of these. I.e. something like MAX_IDLE_CONNECTIONS 10 and CONN_MAX_LIFETIME 15m would work.
@m-a-v commented on GitHub (Oct 21, 2019):
I could test it again with the repo. Which branch should I take? Which parameters (I've seen that discussions continued)?
@m-a-v commented on GitHub (Oct 21, 2019):
Did you also fix this?
@m-a-v commented on GitHub (Oct 31, 2019):
I have made several experiments with the currently running gitea server(v1.7.4 and with the new version v.1.9.5). The netstat snapshots were created at the peak of the number of open connections.
Version 1.7.4
Version 1.9.5 (and same default settings as with 1.7.4
Version 1.9.5 (CONN_MAX_LIFETIME = 45s, MAX_IDLE_CONNS = 10, MAX_OPEN_CONNS = 10)
With both configurations the LFS servers has much too many open connections. So I think we still have serious problems with large LFS repos.
The clone process just freezes at a certain percentage (as soon as there are too many connections).
I think this bug should be reopened.
@zeripath commented on GitHub (Oct 31, 2019):
#8528 was only backported to 1.10 as #8618 . It was not backported to 1.9.5.
Setting MAX_OPEN_CONNS won't have any effect on 1.9.5.
Please try on 1.10-rc2 or master.
@m-a-v commented on GitHub (Oct 31, 2019):
master (CONN_MAX_LIFETIME = 45s, MAX_IDLE_CONNS = 10, MAX_OPEN_CONNS = 10)
The checkout succeeds but still many used connections remain in TIME_WAIT status. If multiple clients would access the LFS server it could not handle it.
@zeripath commented on GitHub (Oct 31, 2019):
Your max lifetime is probably too low, 45s seems aggressive.
Are you sure all of those connections are db connections? Lots of http connections will be made when dealing with lots of lfs objects. (There's probably some more efficiencies we can do.)
If they're all db then multiple users won't change it - you're likely at your max as it should be mathematically determinable:
Total Connections = open +idle + timewait
If max open=max idle:
Max C = O + W
dC/dt = dO/dt + dW/dt
max dO/dt = 0 (as it's fixed)
max dW/dT = max_o/max_l - W/max_tw
dC/dt is positive around C=0 therefore dC/dt=0 should represent max for positive C and thence maximize W.
max_W = max_tw * max_o / max_l
If they're all db then you have a very long max tw or I've messed up in my maths somewhere.
You can set your time_wait at a server network stack level.
@m-a-v commented on GitHub (Oct 31, 2019):
I've chosen the 45 seconds from the discussion between you and @guillep2k in #8528.
How are the connections reused? Where is this made in the code? I assume after a connection is closed it will go in the TIME_WAIT state.
I don't know if all are db connections. Why did it work with 1.7.4 almost perfectly (see above)?
@m-a-v commented on GitHub (Oct 31, 2019):
This could be interesting:
https://stackoverflow.com/questions/1931043/avoiding-time-wait
"Probably the best option, if it's doable: refactor your protocol so that connections that are finished aren't closed, but go into an "idle" state so they can be re-used later, instead of opening up a new connection (like HTTP keep-alive)."
"Setting SO_REUSEADDR on the client side doesn't help the server side unless it also sets SO_REUSEADDR"
@guillep2k commented on GitHub (Oct 31, 2019):
@zeripath @m-a-v It must be noticed that not all
TIME_WAITconnections are from the database. Internal requests (e.g.internalrouter) and many others will create quick http connections that may or may not be reused.@m-a-v it would be cool if you'd break your statistics down by listening port number.
@guillep2k commented on GitHub (Oct 31, 2019):
I don't think
SO_REUSEADDRapplies here. If you're down into this level of optimization, I'd suggest tuning thetcp_fin_timeoutparameter in the kernel. Too short a value will have ill side effects, though; I'd wouldn't set it below 30 seconds.But
TIME_WAITis actually the symptom, not the problem.@m-a-v commented on GitHub (Nov 1, 2019):
@guillep2k What do you exactly mean with "it would be cool if you'd break your statistics down by listening port number"?
tcp_fin_timeout is set to 60 seconds on my system. Ubuntu 18.04 LTS standard configuration.
The question still remains. Why did it work perfectly with 1.7.4 (and earlier) and now anymore?
@guillep2k commented on GitHub (Nov 1, 2019):
@m-a-v
@guillep2k commented on GitHub (Nov 1, 2019):
I don't know, I'd need to check the code. The important thing is that it's taken care of now. 😁
@m-a-v commented on GitHub (Nov 1, 2019):
I meant "and now not anymore".
@guillep2k commented on GitHub (Nov 1, 2019):
I meant it's now solved by properly handling
CONN_MAX_LIFETIME,MAX_IDLE_CONNSandMAX_OPEN_CONNS.@m-a-v If you want to investigate what's the specific change between 1.7.4 and 1.9.5 that caused this, I'd be interested in learning about your results.
@gabyx commented on GitHub (Dec 20, 2019):
on 1.7.4 (
9f33aa6) I had lots of connections when cloning too on the peak, whenFiltering...-> LFS Smudge:When
git lfs push --all originat the peaksuddenly the client hangs on 97%.
GIT_TRACE=truedoes not show anything... it just hangs... possible not related to gitea.@gabyx commented on GitHub (Jan 7, 2020):
on 1.11.0+dev-563-gbcac7cb93:
netstat -ant | grep TIME_WAIT | awk '{print $5 " " $6}' | cut -d: -f2 | sort | uniq -cPeak is 280 connections in TIME_WAIT.