mirror of
https://github.com/go-gitea/gitea.git
synced 2026-03-22 22:45:27 -05:00
Make the default /robots.txt reject all crawlers #255
Closed
opened 2025-11-02 03:16:06 -06:00 by GiteaMirror
·
18 comments
No Branch/Tag Specified
main
release/v1.25
release/v1.24
release/v1.23
release/v1.22
release/v1.21
release/v1.20
release/v1.19
release/v1.18
release/v1.17
release/v1.16
release/v1.15
release/v1.14
release/v1.13
release/v1.12
release/v1.11
release/v1.10
release/v1.9
release/v1.8
v1.25.3
v1.25.2
v1.25.1
v1.25.0
v1.24.7
v1.25.0-rc0
v1.26.0-dev
v1.24.6
v1.24.5
v1.24.4
v1.24.3
v1.24.2
v1.24.1
v1.24.0
v1.23.8
v1.24.0-rc0
v1.25.0-dev
v1.23.7
v1.23.6
v1.23.5
v1.23.4
v1.23.3
v1.23.2
v1.23.1
v1.23.0
v1.23.0-rc0
v1.24.0-dev
v1.22.6
v1.22.5
v1.22.4
v1.22.3
v1.22.2
v1.22.1
v1.22.0
v1.23.0-dev
v1.22.0-rc1
v1.21.11
v1.22.0-rc0
v1.21.10
v1.21.9
v1.21.8
v1.21.7
v1.21.6
v1.21.5
v1.21.4
v1.21.3
v1.21.2
v1.20.6
v1.21.1
v1.21.0
v1.21.0-rc2
v1.21.0-rc1
v1.20.5
v1.22.0-dev
v1.21.0-rc0
v1.20.4
v1.20.3
v1.20.2
v1.20.1
v1.20.0
v1.19.4
v1.21.0-dev
v1.20.0-rc2
v1.20.0-rc1
v1.20.0-rc0
v1.19.3
v1.19.2
v1.19.1
v1.19.0
v1.19.0-rc1
v1.20.0-dev
v1.19.0-rc0
v1.18.5
v1.18.4
v1.18.3
v1.18.2
v1.18.1
v1.18.0
v1.17.4
v1.18.0-rc1
v1.19.0-dev
v1.18.0-rc0
v1.17.3
v1.17.2
v1.17.1
v1.17.0
v1.17.0-rc2
v1.16.9
v1.17.0-rc1
v1.18.0-dev
v1.16.8
v1.16.7
v1.16.6
v1.16.5
v1.16.4
v1.16.3
v1.16.2
v1.16.1
v1.16.0
v1.15.11
v1.17.0-dev
v1.16.0-rc1
v1.15.10
v1.15.9
v1.15.8
v1.15.7
v1.15.6
v1.15.5
v1.15.4
v1.15.3
v1.15.2
v1.15.1
v1.14.7
v1.15.0
v1.15.0-rc3
v1.14.6
v1.15.0-rc2
v1.14.5
v1.16.0-dev
v1.15.0-rc1
v1.14.4
v1.14.3
v1.14.2
v1.14.1
v1.14.0
v1.13.7
v1.14.0-rc2
v1.13.6
v1.13.5
v1.14.0-rc1
v1.15.0-dev
v1.13.4
v1.13.3
v1.13.2
v1.13.1
v1.13.0
v1.12.6
v1.13.0-rc2
v1.14.0-dev
v1.13.0-rc1
v1.12.5
v1.12.4
v1.12.3
v1.12.2
v1.12.1
v1.11.8
v1.12.0
v1.11.7
v1.12.0-rc2
v1.11.6
v1.12.0-rc1
v1.13.0-dev
v1.11.5
v1.11.4
v1.11.3
v1.10.6
v1.12.0-dev
v1.11.2
v1.10.5
v1.11.1
v1.10.4
v1.11.0
v1.11.0-rc2
v1.10.3
v1.11.0-rc1
v1.10.2
v1.10.1
v1.10.0
v1.9.6
v1.9.5
v1.10.0-rc2
v1.11.0-dev
v1.10.0-rc1
v1.9.4
v1.9.3
v1.9.2
v1.9.1
v1.9.0
v1.9.0-rc2
v1.10.0-dev
v1.9.0-rc1
v1.8.3
v1.8.2
v1.8.1
v1.8.0
v1.8.0-rc3
v1.7.6
v1.8.0-rc2
v1.7.5
v1.8.0-rc1
v1.9.0-dev
v1.7.4
v1.7.3
v1.7.2
v1.7.1
v1.7.0
v1.7.0-rc3
v1.6.4
v1.7.0-rc2
v1.6.3
v1.7.0-rc1
v1.7.0-dev
v1.6.2
v1.6.1
v1.6.0
v1.6.0-rc2
v1.5.3
v1.6.0-rc1
v1.6.0-dev
v1.5.2
v1.5.1
v1.5.0
v1.5.0-rc2
v1.5.0-rc1
v1.5.0-dev
v1.4.3
v1.4.2
v1.4.1
v1.4.0
v1.4.0-rc3
v1.4.0-rc2
v1.3.3
v1.4.0-rc1
v1.3.2
v1.3.1
v1.3.0
v1.3.0-rc2
v1.3.0-rc1
v1.2.3
v1.2.2
v1.2.1
v1.2.0
v1.2.0-rc3
v1.2.0-rc2
v1.1.4
v1.2.0-rc1
v1.1.3
v1.1.2
v1.1.1
v1.1.0
v1.0.2
v1.0.1
v1.0.0
v0.9.99
Labels
Clear labels
$20
$250
$50
$500
backport/done
💎 Bounty
docs-update-needed
good first issue
hacktoberfest
issue/bounty
issue/confirmed
issue/critical
issue/duplicate
issue/needs-feedback
issue/not-a-bug
issue/regression
issue/stale
issue/workaround
lgtm/need 2
modifies/api
modifies/translation
outdated/backport/v1.18
outdated/theme/markdown
outdated/theme/timetracker
performance/bigrepo
performance/cpu
performance/memory
performance/speed
pr/breaking
proposal/accepted
proposal/rejected
pr/wip
pull-request
reviewed/wontfix
💰 Rewarded
skip-changelog
status/blocked
topic/accessibility
topic/api
topic/authentication
topic/build
topic/code-linting
topic/commit-signing
topic/content-rendering
topic/deployment
topic/distribution
topic/federation
topic/gitea-actions
topic/issues
topic/lfs
topic/mobile
topic/moderation
topic/packages
topic/pr
topic/projects
topic/repo
topic/repo-migration
topic/security
topic/theme
topic/ui
topic/ui-interaction
topic/ux
topic/webhooks
topic/wiki
type/bug
type/deprecation
type/docs
type/enhancement
type/feature
type/miscellaneous
type/proposal
type/question
type/refactoring
type/summary
type/testing
type/upstream
Mirrored from GitHub Pull Request
No Label
type/docs
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/gitea#255
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @sztanpet on GitHub (Jan 20, 2017).
I am wondering whether this is going too far or not. In my mind, the default for privately set-up gitea instances should be private by default and that entails rejecting crawlers too as a way to reduce surprise to the user.
@sztanpet commented on GitHub (Jan 20, 2017):
Not even mentioning as secure as possible but still making it easy to use for the user by default. Which should entail hiding version numbers, disabling gravatar and other information leaking features and making the default be private repositories, etc, but that is a separate discussion.
@bkcsoft commented on GitHub (Jan 20, 2017):
Maybe not make it default, but if
REQUIRE_SIGNIN_VIEWis set to true, and/robots.txtisn't found, Gitea could provide a default "block all robots".txt 🙂@bkcsoft commented on GitHub (Jan 20, 2017):
(Since if
REQUIRE_SIGNIN_VIEWis set it seems m00t for a crawler to crawl it 😛 )@sztanpet commented on GitHub (Jan 20, 2017):
well yes, but at that point it doesn't really matter, so I think it doesn't go far enough
@tboerger commented on GitHub (Jan 20, 2017):
IMHO we should not block everything by default. For sure there are enough instances that don't want to block everything. Private repositories are anyway blocked at all because it's not accessible. If somebody really wants to enforce that, he can add a robots.txt to the custom folder.
@strk commented on GitHub (Jan 26, 2017):
I've just had a problem with robots, but in my case the service is running from a suburl so serving a robots.txt from Gitea would not have helped. Unless I'm missing a specification allowing for that. What I've been reading (not much) came from http://www.robotstxt.org/robotstxt.html
For top-level installs, generating a robots.txt would indeed be good as it would allow for example preventing bots from downloading archives for each committish, which in turn fills up disk space (see #769) - according to the lecture above (robotstxt.org) you cannot use globs in a robots.txt file so having it automatically generated helps with instances where everyone can create new repos...
@lunny commented on GitHub (Oct 14, 2019):
We could have two examples, one for private sites another for public sites.
@guillep2k commented on GitHub (Oct 14, 2019):
I agree with some comments I've read: Gitea should come with some sensible default robots.txt for public sites, not as a sample but installed as default. The users will of course be able to replace it as they see fit.
BTW: what are robots.txt used for in private sites?
EDIT: I thought it meant intranet sites, sorry!
@zeripath commented on GitHub (Oct 14, 2019):
I agree it is probably reasonable to provide a sensible example of robots.txt for a basic public site -that's specific knowledge that's appropriate for Gitea. For private sites, we could put something on the website documentation but it's basically:
I guess we have to decide what level of basic support we think we should give - but our documentation is supposed to cover specific Gitea information. This would probably class as basic hardening and therefore just about appropriate.
@8ctopus commented on GitHub (Dec 31, 2019):
I just experienced the negative surprise to see my private gitea repository indexed. I naively thought search engines would not find the subfolder on my website, but they did.
Based on my experience, I would suggest:
@tboerger commented on GitHub (Dec 31, 2019):
Private repositories won't get indexed. You simply got public repos what should be totally obviously indexed if they are found by Google or other search engines.
@8ctopus commented on GitHub (Jan 4, 2020):
@tboerger what I mean, is that I have a repo I need to share with fellow team members that I don't want to be indexed by search engines. For ease of use, I also opted to have the url publicly accessible provided you know the address.
@tboerger commented on GitHub (Jan 4, 2020):
Than you should add a custom robots.txt. Not everybody wants to hide all the repos. If something is private, make it private. Everything else could be generally fine to get indexed.
The few exceptions that want to avoid it: add a robots.txt to the customization.
@lunny commented on GitHub (Oct 15, 2020):
An config item and option should be in the installation page to let users chose if allow crawlers.
@techknowlogick commented on GitHub (Dec 9, 2020):
Documentation has been added for those that want to change their install.
@alexanderadam commented on GitHub (Dec 9, 2020):
In case someone is looking for it, you can find it here.
@Mikaela commented on GitHub (Dec 9, 2020):
I was hoping it would be something like https://git.nixnet.services/robots.txt and advise what are the addresses to block the same page appearing in multiple languages and how to allow indexing outside of specific commits or similar that most likely aren't useful for casual search engine user.
I think some sort of an explanation for the X-Robot-Tag header or including it in Gitea and explaining its relationship to robots.txt could also be useful, but I guess that is a separate issue.
@techknowlogick commented on GitHub (Dec 9, 2020):
@Mikaela in the latest stable version I contributed a PR that removes the links in footer to swap languages, for an alternative that provides the same functionality but without crawlers knowing about the links.