mirror of
https://github.com/go-gitea/gitea.git
synced 2026-03-13 02:57:44 -05:00
Indexer returns no results for some terms #4707
Closed
opened 2025-11-02 06:00:19 -06:00 by GiteaMirror
·
27 comments
No Branch/Tag Specified
main
release/v1.25
release/v1.24
release/v1.23
release/v1.22
release/v1.21
release/v1.20
release/v1.19
release/v1.18
release/v1.17
release/v1.16
release/v1.15
release/v1.14
release/v1.13
release/v1.12
release/v1.11
release/v1.10
release/v1.9
release/v1.8
v1.25.3
v1.25.2
v1.25.1
v1.25.0
v1.24.7
v1.25.0-rc0
v1.26.0-dev
v1.24.6
v1.24.5
v1.24.4
v1.24.3
v1.24.2
v1.24.1
v1.24.0
v1.23.8
v1.24.0-rc0
v1.25.0-dev
v1.23.7
v1.23.6
v1.23.5
v1.23.4
v1.23.3
v1.23.2
v1.23.1
v1.23.0
v1.23.0-rc0
v1.24.0-dev
v1.22.6
v1.22.5
v1.22.4
v1.22.3
v1.22.2
v1.22.1
v1.22.0
v1.23.0-dev
v1.22.0-rc1
v1.21.11
v1.22.0-rc0
v1.21.10
v1.21.9
v1.21.8
v1.21.7
v1.21.6
v1.21.5
v1.21.4
v1.21.3
v1.21.2
v1.20.6
v1.21.1
v1.21.0
v1.21.0-rc2
v1.21.0-rc1
v1.20.5
v1.22.0-dev
v1.21.0-rc0
v1.20.4
v1.20.3
v1.20.2
v1.20.1
v1.20.0
v1.19.4
v1.21.0-dev
v1.20.0-rc2
v1.20.0-rc1
v1.20.0-rc0
v1.19.3
v1.19.2
v1.19.1
v1.19.0
v1.19.0-rc1
v1.20.0-dev
v1.19.0-rc0
v1.18.5
v1.18.4
v1.18.3
v1.18.2
v1.18.1
v1.18.0
v1.17.4
v1.18.0-rc1
v1.19.0-dev
v1.18.0-rc0
v1.17.3
v1.17.2
v1.17.1
v1.17.0
v1.17.0-rc2
v1.16.9
v1.17.0-rc1
v1.18.0-dev
v1.16.8
v1.16.7
v1.16.6
v1.16.5
v1.16.4
v1.16.3
v1.16.2
v1.16.1
v1.16.0
v1.15.11
v1.17.0-dev
v1.16.0-rc1
v1.15.10
v1.15.9
v1.15.8
v1.15.7
v1.15.6
v1.15.5
v1.15.4
v1.15.3
v1.15.2
v1.15.1
v1.14.7
v1.15.0
v1.15.0-rc3
v1.14.6
v1.15.0-rc2
v1.14.5
v1.16.0-dev
v1.15.0-rc1
v1.14.4
v1.14.3
v1.14.2
v1.14.1
v1.14.0
v1.13.7
v1.14.0-rc2
v1.13.6
v1.13.5
v1.14.0-rc1
v1.15.0-dev
v1.13.4
v1.13.3
v1.13.2
v1.13.1
v1.13.0
v1.12.6
v1.13.0-rc2
v1.14.0-dev
v1.13.0-rc1
v1.12.5
v1.12.4
v1.12.3
v1.12.2
v1.12.1
v1.11.8
v1.12.0
v1.11.7
v1.12.0-rc2
v1.11.6
v1.12.0-rc1
v1.13.0-dev
v1.11.5
v1.11.4
v1.11.3
v1.10.6
v1.12.0-dev
v1.11.2
v1.10.5
v1.11.1
v1.10.4
v1.11.0
v1.11.0-rc2
v1.10.3
v1.11.0-rc1
v1.10.2
v1.10.1
v1.10.0
v1.9.6
v1.9.5
v1.10.0-rc2
v1.11.0-dev
v1.10.0-rc1
v1.9.4
v1.9.3
v1.9.2
v1.9.1
v1.9.0
v1.9.0-rc2
v1.10.0-dev
v1.9.0-rc1
v1.8.3
v1.8.2
v1.8.1
v1.8.0
v1.8.0-rc3
v1.7.6
v1.8.0-rc2
v1.7.5
v1.8.0-rc1
v1.9.0-dev
v1.7.4
v1.7.3
v1.7.2
v1.7.1
v1.7.0
v1.7.0-rc3
v1.6.4
v1.7.0-rc2
v1.6.3
v1.7.0-rc1
v1.7.0-dev
v1.6.2
v1.6.1
v1.6.0
v1.6.0-rc2
v1.5.3
v1.6.0-rc1
v1.6.0-dev
v1.5.2
v1.5.1
v1.5.0
v1.5.0-rc2
v1.5.0-rc1
v1.5.0-dev
v1.4.3
v1.4.2
v1.4.1
v1.4.0
v1.4.0-rc3
v1.4.0-rc2
v1.3.3
v1.4.0-rc1
v1.3.2
v1.3.1
v1.3.0
v1.3.0-rc2
v1.3.0-rc1
v1.2.3
v1.2.2
v1.2.1
v1.2.0
v1.2.0-rc3
v1.2.0-rc2
v1.1.4
v1.2.0-rc1
v1.1.3
v1.1.2
v1.1.1
v1.1.0
v1.0.2
v1.0.1
v1.0.0
v0.9.99
Labels
Clear labels
$20
$250
$50
$500
backport/done
💎 Bounty
docs-update-needed
good first issue
hacktoberfest
issue/bounty
issue/confirmed
issue/critical
issue/duplicate
issue/needs-feedback
issue/not-a-bug
issue/regression
issue/stale
issue/workaround
lgtm/need 2
modifies/api
modifies/translation
outdated/backport/v1.18
outdated/theme/markdown
outdated/theme/timetracker
performance/bigrepo
performance/cpu
performance/memory
performance/speed
pr/breaking
proposal/accepted
proposal/rejected
pr/wip
pull-request
reviewed/wontfix
💰 Rewarded
skip-changelog
status/blocked
topic/accessibility
topic/api
topic/authentication
topic/build
topic/code-linting
topic/commit-signing
topic/content-rendering
topic/deployment
topic/distribution
topic/federation
topic/gitea-actions
topic/issues
topic/lfs
topic/mobile
topic/moderation
topic/packages
topic/pr
topic/projects
topic/repo
topic/repo-migration
topic/security
topic/theme
topic/ui
topic/ui-interaction
topic/ux
topic/webhooks
topic/wiki
type/bug
type/deprecation
type/docs
type/enhancement
type/feature
type/miscellaneous
type/proposal
type/question
type/refactoring
type/summary
type/testing
type/upstream
Mirrored from GitHub Pull Request
No Label
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/gitea#4707
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @gerroon on GitHub (Jan 22, 2020).
[x]):Description
I enabled the indexer. It has been running for couple days since then. I am able to search and get some results but some results return no results by the Code search page meanwhile I can get 10s on results for with grep
For the term "tool_set" in the Code search page I get
No source code matching your search term found.Grepping the same code base (eve after deleting the comment lines)
find -type f -name "*.py" -exec grep -i 'tool_set' {} \; |sed '/#/d' |wc -l 44ini
@guillep2k commented on GitHub (Jan 23, 2020):
The indexer itself can handle your case. I've specifically tested with
tool_setand it was indexed correctly when I ran the indexer from scratch. The indexer is having some problems, however, because I'm getting errors in the log I can't pinpoint like:Which clogs the indexer queue. If I restart the instance and commit new changes to the repository, the indexer seems to pick them up correctly.
The indexer is expected to take a "long time" to build, but not days. It took a couple of minutes to build from scratch my indexes on 327 MB of repositories.
@gerroon commented on GitHub (Jan 23, 2020):
That is interesting.Where is a good place to see the indexer having issues? I did grep on he gitea log but not much that I can see
https://paste.debian.net/hidden/3591c5ca/
I also wonder if there is a limit to the size of the indexer db, mine is it at 285mb now and I have many repos in there.
@guillep2k commented on GitHub (Jan 23, 2020):
My log configuration in
app.ini:(It's a little redacted, so maybe not all options make sense)
This separates the SQL (XORM) log from the other logs, making everything cleaner. I've also set up a trace to every error, so I know exactly where every log is produced.
To get a meaningful log I stopped Gitea and deleted the
repos.blevedirectory to force the system to rebuild them when restarted. You'll know it finished when it stops growing (which is not necessarily when the log says it does... in fact my log was not useful about that).Then I've edited a file using the web UI, and when the indexer attempted to do its thing, it crashed.
(NOTE: your paste doesn't say much, unfortunately)
@lunny commented on GitHub (Jan 23, 2020):
@guillep2k what's the gitea version?
@guillep2k commented on GitHub (Jan 23, 2020):
@lunny I've tested on
masteras of today. (53f9dbfc7b)@guillep2k commented on GitHub (Jan 23, 2020):
BTW, the indexes of my prod instance are 1.3GB from 1.4GB of repositories (working fine on Gitea 1.10.3).
@gerroon commented on GitHub (Jan 23, 2020):
@guillep2k I will test with the latest rc2 from today. I will delete the database and force it again.
Btw is there a way to force the indexer while gitea is running?
@guillep2k commented on GitHub (Jan 23, 2020):
If by force you mean rebuild all, no, there isn't. But files are re-indexed with each commit (only the affected files, the whole file is re-indexed, not just the diff).
@gerroon commented on GitHub (Jan 23, 2020):
Hmm the latest rc2 fails on me with
@lunny commented on GitHub (Jan 23, 2020):
Could you find the file
rupture_sharded_meta.jsonon indexer directory ?@gerroon commented on GitHub (Jan 23, 2020):
There is no
rupture_sharded_meta.json@lunny commented on GitHub (Jan 23, 2020):
@gerroon could you paste the content of that two files?
@gerroon commented on GitHub (Jan 23, 2020):
@gerroon commented on GitHub (Jan 23, 2020):
Ok, I deleted the whole indexer thing, installed the latest nightly (v1.11.0-rc2) . The database grew to 3gb
However it still cant find
tool_setI did another search for
builtin. It located about30searches in the whole GItea contolled repos. Since I do not have the clones of all the repos, I made a search in the largest one I cloned forbuiltinIt returned and the difference is by huge magnitutes, not even close (30 vs 542).@gerroon commented on GitHub (Jan 23, 2020):
One thing I am seeing is that
183.27 K/s 0.00 B/s 0.00 % 95.49 % gitea web -c /opt/gitea/custom/conf/app.inidoing constant reading (holding %99 of the system io) without writing and never giving up whatever it is doing. And the databasestorefile was last updated like 4 hours ago. So whatever is reading from the disk is not written back given that the database file has not been updated for like 4 hours?Here is the lsof for gitea
@guillep2k commented on GitHub (Jan 23, 2020):
It would be useful to have some logs for the time span of your tests.
EDIT: (I mean, for context)
@gerroon commented on GitHub (Jan 23, 2020):
I would like to but there a lot of personal information in the logs, alot about my projects, issues, wikis etc etc If you cna tell me what specific you are looking for I can definetely provide it like crashes. But I am not seeing any of those there.
@guillep2k commented on GitHub (Jan 24, 2020):
I think I've found an important bug! But it should only manifest itself as repos not being updated (creation of indexes from scratch should not be affected).
As for the error message in my instance:
I've been debugging and it turns out this error is expected as I have one corrupt repo, so
git show-ref -sreturns.... a silent exit status of 1. I believe this should not affect the indexing of other repos, because the error is logged and the indexer just continues processing its queue.About the bug I've mentioned, I'll post a PR momentarily.
@gerroon commented on GitHub (Jan 24, 2020):
Sounds good.
I just started from scratch again, this time I added include files list so that the scope is limited since I am mostly interested in txt and py files (my repos have alot of binary fiels too). I will report back if that does any good.
@gerroon commented on GitHub (Jan 24, 2020):
Ok that did not work perfectly either. So here is the result from the Gitea code search page for
builtin.transformI am only including the results from the same repo in Code search and the Grep search.One speculation I can make is that Code search seems to only return one result per file (compare it to the grep seearch), which can be one of the culprits if not the whole problem.
and here is from the terminal
grep -ir "builtin.transform" *It still not reporting anything aabout "tool_set" for this repo I listed above, but see what ack returns for the repo given above.
@guillep2k commented on GitHub (Jan 24, 2020):
Oh! 🤦♂
The indexer indexes only the first instance of any term per file. It's not meant to be a full text search.
@gerroon commented on GitHub (Jan 24, 2020):
Interesting. Thn maybe it is not even going to return partial results?
Here
tool_set_by_namereturns 2 results from the whole Gitea. Meanwhile grep can return many foir a single repo. Maybe that explains why "tool_set" returns none in some ways?@guillep2k commented on GitHub (Jan 24, 2020):
It should return result per file where it occurs, as long as it's in
master(or whatever branch is your default) and "indexable" (i.e. not filtered out by your settings or ... ehem .... . perhaps your files are marked as executable?). 😳@zeripath commented on GitHub (Jan 24, 2020):
as per @guillep2k
@vvrein commented on GitHub (Jan 25, 2020):
May this https://github.com/go-gitea/gitea/issues/9190#issuecomment-571563226 be related to this issue?
@vvrein commented on GitHub (Jan 25, 2020):
Re-checked https://github.com/go-gitea/gitea/issues/9190#issuecomment-571563226 behavior with latest upstream version
1.12.0+dev-174-g5b17bb8f3Seems working now!
Repo index was updated after git push
@zeripath commented on GitHub (Jan 25, 2020):
@vvrein I'm gonna close this as Fixed by #9965 and #9957