Indexer returns no results for some terms #4707

Closed
opened 2025-11-02 06:00:19 -06:00 by GiteaMirror · 27 comments
Owner

Originally created by @gerroon on GitHub (Jan 22, 2020).

  • Gitea version (or commit ref): 1.11.0-rc1
  • Git version: 2.24.1
  • Operating system: Debian testing
  • Database (use [x]):
    • PostgreSQL
    • MySQL
    • MSSQL
    • SQLite
  • Can you reproduce the bug at https://try.gitea.io:
    • Yes (provide example URL)
    • No
    • [X ] Not relevant

Description

I enabled the indexer. It has been running for couple days since then. I am able to search and get some results but some results return no results by the Code search page meanwhile I can get 10s on results for with grep

For the term "tool_set" in the Code search page I get No source code matching your search term found.

Grepping the same code base (eve after deleting the comment lines)

find -type f -name "*.py" -exec grep -i 'tool_set' {} \; |sed '/#/d' |wc -l 44

ini

[indexer]
REPO_INDEXER_ENABLED = true
ISSUE_INDEXER_PATH: indexers/issues.bleve
REPO_INDEXER_PATH: indexers/repos.bleve
UPDATE_BUFFER_LEN: 20
MAX_FILE_SIZE: 1048576
Originally created by @gerroon on GitHub (Jan 22, 2020). - Gitea version (or commit ref): 1.11.0-rc1 - Git version: 2.24.1 - Operating system: Debian testing - Database (use `[x]`): - [ ] PostgreSQL - [ ] MySQL - [ ] MSSQL - [X] SQLite - Can you reproduce the bug at https://try.gitea.io: - [ ] Yes (provide example URL) - [ ] No - [X ] Not relevant ## Description I enabled the indexer. It has been running for couple days since then. I am able to search and get some results but some results return no results by the Code search page meanwhile I can get 10s on results for with grep For the term "tool_set" in the Code search page I get `No source code matching your search term found. ` Grepping the same code base (eve after deleting the comment lines) ` find -type f -name "*.py" -exec grep -i 'tool_set' {} \; |sed '/#/d' |wc -l 44 ` ini ``` [indexer] REPO_INDEXER_ENABLED = true ISSUE_INDEXER_PATH: indexers/issues.bleve REPO_INDEXER_PATH: indexers/repos.bleve UPDATE_BUFFER_LEN: 20 MAX_FILE_SIZE: 1048576 ```
Author
Owner

@guillep2k commented on GitHub (Jan 23, 2020):

The indexer itself can handle your case. I've specifically tested with tool_set and it was indexed correctly when I ran the indexer from scratch. The indexer is having some problems, however, because I'm getting errors in the log I can't pinpoint like:

2020/01/22 21:04:05 ...ndexer/code/queue.go:39:processRepoIndexerOperationQueue() [E] indexer.Index: exit status 1
        /home/gprandi/src/code.gitea.io/gitea/modules/indexer/code/queue.go:39 (0x187f5f7)
                processRepoIndexerOperationQueue: log.Error("indexer.Index: %v", err)
        /home/gprandi/go/src/runtime/asm_amd64.s:1357 (0x46f5d0)
                goexit: BYTE    $0x90   // NOP

Which clogs the indexer queue. If I restart the instance and commit new changes to the repository, the indexer seems to pick them up correctly.

The indexer is expected to take a "long time" to build, but not days. It took a couple of minutes to build from scratch my indexes on 327 MB of repositories.

@guillep2k commented on GitHub (Jan 23, 2020): The indexer itself can handle your case. I've specifically tested with `tool_set` and it was indexed correctly when I ran the indexer from scratch. The indexer is having some problems, however, because I'm getting errors in the log I can't pinpoint like: ``` 2020/01/22 21:04:05 ...ndexer/code/queue.go:39:processRepoIndexerOperationQueue() [E] indexer.Index: exit status 1 /home/gprandi/src/code.gitea.io/gitea/modules/indexer/code/queue.go:39 (0x187f5f7) processRepoIndexerOperationQueue: log.Error("indexer.Index: %v", err) /home/gprandi/go/src/runtime/asm_amd64.s:1357 (0x46f5d0) goexit: BYTE $0x90 // NOP ``` Which clogs the indexer queue. If I restart the instance and commit new changes to the repository, the indexer seems to pick them up correctly. The indexer is expected to take a "long time" to build, but not _days_. It took a couple of minutes to build from scratch my indexes on 327 MB of repositories.
Author
Owner

@gerroon commented on GitHub (Jan 23, 2020):

That is interesting.Where is a good place to see the indexer having issues? I did grep on he gitea log but not much that I can see

https://paste.debian.net/hidden/3591c5ca/

I also wonder if there is a limit to the size of the indexer db, mine is it at 285mb now and I have many repos in there.

@gerroon commented on GitHub (Jan 23, 2020): That is interesting.Where is a good place to see the indexer having issues? I did grep on he gitea log but not much that I can see https://paste.debian.net/hidden/3591c5ca/ I also wonder if there is a limit to the size of the indexer db, mine is it at 285mb now and I have many repos in there.
Author
Owner

@guillep2k commented on GitHub (Jan 23, 2020):

My log configuration in app.ini:

[log]
MODE             = file
MAX_DAYS         = 15
LEVEL = Info
ROUTER           = file
ROUTER_LOG_LEVEL = Trace
STACKTRACE_LEVEL = Error
XORM = file
REDIRECT_MACARON_LOG = true

[log.file.xorm]
FILE_NAME = xorm.log

(It's a little redacted, so maybe not all options make sense)

This separates the SQL (XORM) log from the other logs, making everything cleaner. I've also set up a trace to every error, so I know exactly where every log is produced.

To get a meaningful log I stopped Gitea and deleted the repos.bleve directory to force the system to rebuild them when restarted. You'll know it finished when it stops growing (which is not necessarily when the log says it does... in fact my log was not useful about that).

Then I've edited a file using the web UI, and when the indexer attempted to do its thing, it crashed.

(NOTE: your paste doesn't say much, unfortunately)

@guillep2k commented on GitHub (Jan 23, 2020): My log configuration in `app.ini`: ``` [log] MODE = file MAX_DAYS = 15 LEVEL = Info ROUTER = file ROUTER_LOG_LEVEL = Trace STACKTRACE_LEVEL = Error XORM = file REDIRECT_MACARON_LOG = true [log.file.xorm] FILE_NAME = xorm.log ``` (It's a little redacted, so maybe not all options make sense) This separates the SQL (XORM) log from the other logs, making everything cleaner. I've also set up a trace to every error, so I know exactly where every log is produced. To get a meaningful log I stopped Gitea and deleted the `repos.bleve` directory to force the system to rebuild them when restarted. You'll know it finished when it stops growing (which is _not necessarily_ when the log says it does... in fact my log was not useful about that). Then I've edited a file using the web UI, and when the indexer attempted to do its thing, it crashed. (NOTE: your paste doesn't say much, unfortunately)
Author
Owner

@lunny commented on GitHub (Jan 23, 2020):

@guillep2k what's the gitea version?

@lunny commented on GitHub (Jan 23, 2020): @guillep2k what's the gitea version?
Author
Owner

@guillep2k commented on GitHub (Jan 23, 2020):

@lunny I've tested on master as of today. (53f9dbfc7b)

@guillep2k commented on GitHub (Jan 23, 2020): @lunny I've tested on `master` as of today. (53f9dbfc7bd322a439bd6c6582d69506c7244384)
Author
Owner

@guillep2k commented on GitHub (Jan 23, 2020):

I also wonder if there is a limit to the size of the indexer db, mine is it at 285mb now and I have many repos in there.

BTW, the indexes of my prod instance are 1.3GB from 1.4GB of repositories (working fine on Gitea 1.10.3).

@guillep2k commented on GitHub (Jan 23, 2020): > > > I also wonder if there is a limit to the size of the indexer db, mine is it at 285mb now and I have many repos in there. BTW, the indexes of my prod instance are 1.3GB from 1.4GB of repositories (working fine on Gitea 1.10.3).
Author
Owner

@gerroon commented on GitHub (Jan 23, 2020):

@guillep2k I will test with the latest rc2 from today. I will delete the database and force it again.

Btw is there a way to force the indexer while gitea is running?

@gerroon commented on GitHub (Jan 23, 2020): @guillep2k I will test with the latest rc2 from today. I will delete the database and force it again. Btw is there a way to force the indexer while gitea is running?
Author
Owner

@guillep2k commented on GitHub (Jan 23, 2020):

Btw is there a way to force the indexer while gitea is running?

If by force you mean rebuild all, no, there isn't. But files are re-indexed with each commit (only the affected files, the whole file is re-indexed, not just the diff).

@guillep2k commented on GitHub (Jan 23, 2020): > > Btw is there a way to force the indexer while gitea is running? If by force you mean rebuild all, no, there isn't. But files are re-indexed with each commit (only the affected files, the whole file is re-indexed, not just the diff).
Author
Owner

@gerroon commented on GitHub (Jan 23, 2020):

Hmm the latest rc2 fails on me with

2020/01/22 22:52:07 .../xorm/session_raw.go:78:queryRows() [I] [SQL] SELECT `name` FROM `user` WHERE `id`=? LIMIT 1 []interface {}{1} - took: 26.358µs
2020/01/22 22:52:07 .../xorm/session_raw.go:78:queryRows() [I] [SQL] SELECT `name` FROM `user` WHERE `id`=? LIMIT 1 []interface {}{1} - took: 44.862µs
2020/01/22 22:52:07 .../xorm/session_raw.go:78:queryRows() [I] [SQL] SELECT `name` FROM `user` WHERE `id`=? LIMIT 1 []interface {}{1} - took: 34.792µs
2020/01/22 22:52:07 .../xorm/session_raw.go:78:queryRows() [I] [SQL] SELECT `name` FROM `user` WHERE `id`=? LIMIT 1 []interface {}{1} - took: 27.177µs
2020/01/22 22:52:07 ...exer/code/indexer.go:54:func2() [I] PID: 3759700 Initializing Repository Indexer at: /opt/gitea/indexers/repos.bleve
2020/01/22 22:52:07 ...er/issues/indexer.go:142:func2() [I] PID 3759700: Initializing Issue Indexer: bleve
2020/01/22 22:52:07 .../xorm/session_raw.go:78:queryRows() [I] [SQL] SELECT `pull_request`.`id` FROM `pull_request` WHERE (status=?) []interface {}{1} - took: 158.714µs
2020/01/22 22:52:07 .../xorm/session_raw.go:78:queryRows() [I] [SQL] SELECT `id`, `repo_id`, `hook_id`, `uuid`, `type`, `url`, `signature`, `payload_content`, `http_method`, `content_type`, `event_type`, `is_ssl`, `is_delivered`, `delivered`, `is_succeed`, `request_content`, `response_content` FROM `hook_task` WHERE (is_delivered=?) []interface {}{false} - took: 213.028µs
2020/01/22 22:52:07 routers/init.go:122:GlobalInit() [I] SQLite3 Supported
2020/01/22 22:52:07 routers/init.go:46:checkRunMode() [I] Run Mode: Production
2020/01/22 22:52:07 ...ndexer/code/bleve.go:228:Close() [D] Closing repo indexer
2020/01/22 22:52:07 ...ndexer/code/bleve.go:235:Close() [I] PID: 3759700 Repository Indexer closed
2020/01/22 22:52:07 ...exer/code/indexer.go:63:func2() [F] PID: 3759700 Unable to initialize the Repository Indexer at path: /opt/gitea/indexers/repos.bleve Error: error parsing mapping JSON: unexpected end of JSON input
        mapping contents:

        /go/src/code.gitea.io/gitea/modules/indexer/code/indexer.go:63 (0x124255f)
        /usr/local/go/src/runtime/asm_amd64.s:1357 (0x466c70)


@gerroon commented on GitHub (Jan 23, 2020): Hmm the latest rc2 fails on me with ``` 2020/01/22 22:52:07 .../xorm/session_raw.go:78:queryRows() [I] [SQL] SELECT `name` FROM `user` WHERE `id`=? LIMIT 1 []interface {}{1} - took: 26.358µs 2020/01/22 22:52:07 .../xorm/session_raw.go:78:queryRows() [I] [SQL] SELECT `name` FROM `user` WHERE `id`=? LIMIT 1 []interface {}{1} - took: 44.862µs 2020/01/22 22:52:07 .../xorm/session_raw.go:78:queryRows() [I] [SQL] SELECT `name` FROM `user` WHERE `id`=? LIMIT 1 []interface {}{1} - took: 34.792µs 2020/01/22 22:52:07 .../xorm/session_raw.go:78:queryRows() [I] [SQL] SELECT `name` FROM `user` WHERE `id`=? LIMIT 1 []interface {}{1} - took: 27.177µs 2020/01/22 22:52:07 ...exer/code/indexer.go:54:func2() [I] PID: 3759700 Initializing Repository Indexer at: /opt/gitea/indexers/repos.bleve 2020/01/22 22:52:07 ...er/issues/indexer.go:142:func2() [I] PID 3759700: Initializing Issue Indexer: bleve 2020/01/22 22:52:07 .../xorm/session_raw.go:78:queryRows() [I] [SQL] SELECT `pull_request`.`id` FROM `pull_request` WHERE (status=?) []interface {}{1} - took: 158.714µs 2020/01/22 22:52:07 .../xorm/session_raw.go:78:queryRows() [I] [SQL] SELECT `id`, `repo_id`, `hook_id`, `uuid`, `type`, `url`, `signature`, `payload_content`, `http_method`, `content_type`, `event_type`, `is_ssl`, `is_delivered`, `delivered`, `is_succeed`, `request_content`, `response_content` FROM `hook_task` WHERE (is_delivered=?) []interface {}{false} - took: 213.028µs 2020/01/22 22:52:07 routers/init.go:122:GlobalInit() [I] SQLite3 Supported 2020/01/22 22:52:07 routers/init.go:46:checkRunMode() [I] Run Mode: Production 2020/01/22 22:52:07 ...ndexer/code/bleve.go:228:Close() [D] Closing repo indexer 2020/01/22 22:52:07 ...ndexer/code/bleve.go:235:Close() [I] PID: 3759700 Repository Indexer closed 2020/01/22 22:52:07 ...exer/code/indexer.go:63:func2() [F] PID: 3759700 Unable to initialize the Repository Indexer at path: /opt/gitea/indexers/repos.bleve Error: error parsing mapping JSON: unexpected end of JSON input mapping contents: /go/src/code.gitea.io/gitea/modules/indexer/code/indexer.go:63 (0x124255f) /usr/local/go/src/runtime/asm_amd64.s:1357 (0x466c70) ```
Author
Owner

@lunny commented on GitHub (Jan 23, 2020):

Could you find the file rupture_sharded_meta.json on indexer directory ?

@lunny commented on GitHub (Jan 23, 2020): Could you find the file `rupture_sharded_meta.json` on indexer directory ?
Author
Owner

@gerroon commented on GitHub (Jan 23, 2020):

There is no rupture_sharded_meta.json

find -L -type f|grep -i rupt
./indexers/issues.bleve/rupture_meta.json
./indexers/repos.bleve/rupture_meta.json

@gerroon commented on GitHub (Jan 23, 2020): There is no `rupture_sharded_meta.json` ``` find -L -type f|grep -i rupt ./indexers/issues.bleve/rupture_meta.json ./indexers/repos.bleve/rupture_meta.json ```
Author
Owner

@lunny commented on GitHub (Jan 23, 2020):

@gerroon could you paste the content of that two files?

@lunny commented on GitHub (Jan 23, 2020): @gerroon could you paste the content of that two files?
Author
Owner

@gerroon commented on GitHub (Jan 23, 2020):

cat issues.bleve/rupture_meta.json  repos.bleve/rupture_meta.json 

{"version":1}{"version":4}
@gerroon commented on GitHub (Jan 23, 2020): ``` cat issues.bleve/rupture_meta.json repos.bleve/rupture_meta.json {"version":1}{"version":4} ```
Author
Owner

@gerroon commented on GitHub (Jan 23, 2020):

Ok, I deleted the whole indexer thing, installed the latest nightly (v1.11.0-rc2) . The database grew to 3gb

-rw-r--r-- 1 git git   47 Jan 23 00:04 index_meta.json
-rw-r--r-- 1 git git   13 Jan 23 00:04 rupture_meta.json
-rw------- 1 git git 3.0G Jan 23 08:27 store

However it still cant find tool_set

I did another search for builtin. It located about 30 searches in the whole GItea contolled repos. Since I do not have the clones of all the repos, I made a search in the largest one I cloned for builtin It returned and the difference is by huge magnitutes, not even close (30 vs 542).

grep -ir "builtin" *|wc -l
542

@gerroon commented on GitHub (Jan 23, 2020): Ok, I deleted the whole indexer thing, installed the latest nightly (v1.11.0-rc2) . The database grew to 3gb ``` -rw-r--r-- 1 git git 47 Jan 23 00:04 index_meta.json -rw-r--r-- 1 git git 13 Jan 23 00:04 rupture_meta.json -rw------- 1 git git 3.0G Jan 23 08:27 store ``` However it still cant find `tool_set` I did another search for `builtin`. It located about `30` searches in the whole GItea contolled repos. Since I do not have the clones of all the repos, I made a search in the largest one I cloned for `builtin` It returned and the difference is by huge magnitutes, not even close (`30 vs 542`). ``` grep -ir "builtin" *|wc -l 542 ```
Author
Owner

@gerroon commented on GitHub (Jan 23, 2020):

One thing I am seeing is that 183.27 K/s 0.00 B/s 0.00 % 95.49 % gitea web -c /opt/gitea/custom/conf/app.ini doing constant reading (holding %99 of the system io) without writing and never giving up whatever it is doing. And the database store file was last updated like 4 hours ago. So whatever is reading from the disk is not written back given that the database file has not been updated for like 4 hours?

Here is the lsof for gitea


   1    unix                            33206 type=STREAM
    2    unix                            33206 type=STREAM
    3     REG       0x30     869488   66071775 /media/DRIVE/_TEMP/LOG/gitea/gitea.log
    4 a_inode        0xe          0       8828 [eventpoll]
    5     REG       0x30      31205   66071776 /media/DRIVE/_TEMP/LOG/gitea/macaron.log
    6     REG       0x30          0   66071777 /media/DRIVE/_TEMP/LOG/gitea/router.log
    7     REG       0x30     696800   66071778 /media/DRIVE/_TEMP/LOG/gitea/xorm.log
    8     REG      0x822    2433024    6197630 /media/DRIVEB/opt/gitea/data/gitea.db
    9     REG      0x822          0    6167383 /media/DRIVEB/opt/gitea/data/queues/issue_indexer/LOCK
   10     REG      0x822      28139    6167384 /media/DRIVEB/opt/gitea/data/queues/issue_indexer/LOG
   11    IPv6                                  *:3000
   12     REG      0x822      39378    6163834 /media/DRIVEB/opt/gitea/data/queues/issue_indexer/000102.log
   13     REG      0x822        110    6163840 /media/DRIVEB/opt/gitea/data/queues/issue_indexer/MANIFEST-000103
   14     REG      0x822      15305    6209772 /media/DRIVEB/opt/gitea/data/queues/issue_indexer/000037.ldb
   15     REG      0x822        127    6207818 /media/DRIVEB/opt/gitea/data/queues/issue_indexer/000002.ldb
   16     REG      0x822          0    6167673 /media/DRIVEB/opt/gitea/data/queues/task/LOCK
   17     REG      0x822      26545    6197256 /media/DRIVEB/opt/gitea/data/queues/task/LOG
   18     REG      0x822          0    6164917 /media/DRIVEB/opt/gitea/data/queues/task/000084.log
   19     REG      0x822         70    6164929 /media/DRIVEB/opt/gitea/data/queues/task/MANIFEST-000085
   20     REG      0x822        127    6167385 /media/DRIVEB/opt/gitea/data/queues/task/000002.ldb
   21     REG       0x30    1048576   66074727 /media/DRIVE/GITEA/indexers/issues.bleve/store
   22     REG       0x30 3211452416   66074729 /media/DRIVE/GITEA/indexers/repos.bleve/store
   23    IPv6                                  localhost:3000->localhost:43982
  cwd     DIR       0x30         58     400815 /media/DRIVE/REPO/GITEA
  mem     REG       0x2b              66074727 /media/DRIVE/GITEA/indexers/issues.bleve/store (path dev=0,48)
  mem     REG       0x2b              66074729 /media/DRIVE/GITEA/indexers/repos.bleve/store (path dev=0,48)
  rtd     DIR      0x825       4096          2 /
  txt     REG      0x822   82951528    6056650 /media/DRIVEB/opt/gitea/gitea



@gerroon commented on GitHub (Jan 23, 2020): One thing I am seeing is that `183.27 K/s 0.00 B/s 0.00 % 95.49 % gitea web -c /opt/gitea/custom/conf/app.ini` doing constant reading (holding %99 of the system io) without writing and never giving up whatever it is doing. And the database `store` file was last updated like 4 hours ago. So whatever is reading from the disk is not written back given that the database file has not been updated for like 4 hours? Here is the lsof for gitea ``` 1 unix 33206 type=STREAM 2 unix 33206 type=STREAM 3 REG 0x30 869488 66071775 /media/DRIVE/_TEMP/LOG/gitea/gitea.log 4 a_inode 0xe 0 8828 [eventpoll] 5 REG 0x30 31205 66071776 /media/DRIVE/_TEMP/LOG/gitea/macaron.log 6 REG 0x30 0 66071777 /media/DRIVE/_TEMP/LOG/gitea/router.log 7 REG 0x30 696800 66071778 /media/DRIVE/_TEMP/LOG/gitea/xorm.log 8 REG 0x822 2433024 6197630 /media/DRIVEB/opt/gitea/data/gitea.db 9 REG 0x822 0 6167383 /media/DRIVEB/opt/gitea/data/queues/issue_indexer/LOCK 10 REG 0x822 28139 6167384 /media/DRIVEB/opt/gitea/data/queues/issue_indexer/LOG 11 IPv6 *:3000 12 REG 0x822 39378 6163834 /media/DRIVEB/opt/gitea/data/queues/issue_indexer/000102.log 13 REG 0x822 110 6163840 /media/DRIVEB/opt/gitea/data/queues/issue_indexer/MANIFEST-000103 14 REG 0x822 15305 6209772 /media/DRIVEB/opt/gitea/data/queues/issue_indexer/000037.ldb 15 REG 0x822 127 6207818 /media/DRIVEB/opt/gitea/data/queues/issue_indexer/000002.ldb 16 REG 0x822 0 6167673 /media/DRIVEB/opt/gitea/data/queues/task/LOCK 17 REG 0x822 26545 6197256 /media/DRIVEB/opt/gitea/data/queues/task/LOG 18 REG 0x822 0 6164917 /media/DRIVEB/opt/gitea/data/queues/task/000084.log 19 REG 0x822 70 6164929 /media/DRIVEB/opt/gitea/data/queues/task/MANIFEST-000085 20 REG 0x822 127 6167385 /media/DRIVEB/opt/gitea/data/queues/task/000002.ldb 21 REG 0x30 1048576 66074727 /media/DRIVE/GITEA/indexers/issues.bleve/store 22 REG 0x30 3211452416 66074729 /media/DRIVE/GITEA/indexers/repos.bleve/store 23 IPv6 localhost:3000->localhost:43982 cwd DIR 0x30 58 400815 /media/DRIVE/REPO/GITEA mem REG 0x2b 66074727 /media/DRIVE/GITEA/indexers/issues.bleve/store (path dev=0,48) mem REG 0x2b 66074729 /media/DRIVE/GITEA/indexers/repos.bleve/store (path dev=0,48) rtd DIR 0x825 4096 2 / txt REG 0x822 82951528 6056650 /media/DRIVEB/opt/gitea/gitea ```
Author
Owner

@guillep2k commented on GitHub (Jan 23, 2020):

It would be useful to have some logs for the time span of your tests.

EDIT: (I mean, for context)

@guillep2k commented on GitHub (Jan 23, 2020): It would be useful to have some logs for the time span of your tests. EDIT: (I mean, for context)
Author
Owner

@gerroon commented on GitHub (Jan 23, 2020):

I would like to but there a lot of personal information in the logs, alot about my projects, issues, wikis etc etc If you cna tell me what specific you are looking for I can definetely provide it like crashes. But I am not seeing any of those there.

@gerroon commented on GitHub (Jan 23, 2020): I would like to but there a lot of personal information in the logs, alot about my projects, issues, wikis etc etc If you cna tell me what specific you are looking for I can definetely provide it like crashes. But I am not seeing any of those there.
Author
Owner

@guillep2k commented on GitHub (Jan 24, 2020):

I think I've found an important bug! But it should only manifest itself as repos not being updated (creation of indexes from scratch should not be affected).

As for the error message in my instance:

2020/01/22 21:04:05 ...ndexer/code/queue.go:39:processRepoIndexerOperationQueue() [E] indexer.Index: exit status 1

I've been debugging and it turns out this error is expected as I have one corrupt repo, so git show-ref -s returns.... a silent exit status of 1. I believe this should not affect the indexing of other repos, because the error is logged and the indexer just continues processing its queue.

About the bug I've mentioned, I'll post a PR momentarily.

@guillep2k commented on GitHub (Jan 24, 2020): I think I've found an important bug! But it should only manifest itself as repos not being _updated_ (creation of indexes from scratch should not be affected). As for the error message in my instance: > > ``` > 2020/01/22 21:04:05 ...ndexer/code/queue.go:39:processRepoIndexerOperationQueue() [E] indexer.Index: exit status 1 > ``` > I've been debugging and it turns out this error is expected as I have one corrupt repo, so `git show-ref -s` returns.... a silent exit status of 1. I believe this should not affect the indexing of other repos, because the error is logged and the indexer just continues processing its queue. About the bug I've mentioned, I'll post a PR momentarily.
Author
Owner

@gerroon commented on GitHub (Jan 24, 2020):

Sounds good.

I just started from scratch again, this time I added include files list so that the scope is limited since I am mostly interested in txt and py files (my repos have alot of binary fiels too). I will report back if that does any good.

@gerroon commented on GitHub (Jan 24, 2020): Sounds good. I just started from scratch again, this time I added include files list so that the scope is limited since I am mostly interested in txt and py files (my repos have alot of binary fiels too). I will report back if that does any good.
Author
Owner

@gerroon commented on GitHub (Jan 24, 2020):

Ok that did not work perfectly either. So here is the result from the Gitea code search page for builtin.transform I am only including the results from the same repo in Code search and the Grep search.

One speculation I can make is that Code search seems to only return one result per file (compare it to the grep seearch), which can be one of the culprits if not the whole problem.

MayaConfigV3_2/fa_hotkeys.py
View File

     {"properties":
      [("name", 'builtin.transform'),
       ],

QWER/QWER_Industry_Keymap.py
View File

     {"properties":
      [("name", 'builtin.transform'),
       ],

and here is from the terminal

grep -ir "builtin.transform" *

MayaConfigV3_2/fa_hotkeys.py:536:      [("name", 'builtin.transform'),
MayaConfigV3_2/fa_hotkeys.py:2547:      [("name", 'builtin.transform'),
MayaConfigV3_2/fa_hotkeys.py:2708:      [("name", 'builtin.transform'),
MayaConfigV3_2/fa_hotkeys.py:2715:      [("name", 'builtin.transform'),
QWER/QWER_Industry_Keymap.py:1307:      [("name", 'builtin.transform'),
QWER/QWER_Industry_Keymap.py:1314:      [("name", 'builtin.transform'),
QWER/QWER_Industry_Keymap.py:1321:      [("name", 'builtin.transform'),
QWER/QWER_Industry_Keymap.py:2050:      [("name", 'builtin.transform'),
QWER/QWER_Industry_Keymap.py:2057:      [("name", 'builtin.transform'),
QWER/QWER_Industry_Keymap.py:2064:      [("name", 'builtin.transform'),
QWER/QWER_Industry_Keymap.py:5407:      [("name", 'builtin.transform'),
QWER/QWER_Industry_Keymap.py:7050:      [("name", 'builtin.transform'),

It still not reporting anything aabout "tool_set" for this repo I listed above, but see what ack returns for the repo given above.

ack tool_set *|wc -l                                                                                                                                              
3849
@gerroon commented on GitHub (Jan 24, 2020): Ok that did not work perfectly either. So here is the result from the Gitea code search page for `builtin.transform` I am only including the results from the same repo in Code search and the Grep search. One speculation I can make is that Code search seems to only return one result per file (compare it to the grep seearch), which can be one of the culprits if not the whole problem. ``` MayaConfigV3_2/fa_hotkeys.py View File {"properties": [("name", 'builtin.transform'), ], QWER/QWER_Industry_Keymap.py View File {"properties": [("name", 'builtin.transform'), ], ``` and here is from the terminal `grep -ir "builtin.transform" *` ``` MayaConfigV3_2/fa_hotkeys.py:536: [("name", 'builtin.transform'), MayaConfigV3_2/fa_hotkeys.py:2547: [("name", 'builtin.transform'), MayaConfigV3_2/fa_hotkeys.py:2708: [("name", 'builtin.transform'), MayaConfigV3_2/fa_hotkeys.py:2715: [("name", 'builtin.transform'), QWER/QWER_Industry_Keymap.py:1307: [("name", 'builtin.transform'), QWER/QWER_Industry_Keymap.py:1314: [("name", 'builtin.transform'), QWER/QWER_Industry_Keymap.py:1321: [("name", 'builtin.transform'), QWER/QWER_Industry_Keymap.py:2050: [("name", 'builtin.transform'), QWER/QWER_Industry_Keymap.py:2057: [("name", 'builtin.transform'), QWER/QWER_Industry_Keymap.py:2064: [("name", 'builtin.transform'), QWER/QWER_Industry_Keymap.py:5407: [("name", 'builtin.transform'), QWER/QWER_Industry_Keymap.py:7050: [("name", 'builtin.transform'), ``` It still not reporting anything aabout "tool_set" for this repo I listed above, but see what ack returns for the repo given above. ``` ack tool_set *|wc -l 3849 ```
Author
Owner

@guillep2k commented on GitHub (Jan 24, 2020):

Oh! 🤦‍♂

The indexer indexes only the first instance of any term per file. It's not meant to be a full text search.

@guillep2k commented on GitHub (Jan 24, 2020): Oh! 🤦‍♂ The indexer indexes only the first instance of any term _per file_. It's not meant to be a full text search.
Author
Owner

@gerroon commented on GitHub (Jan 24, 2020):

Interesting. Thn maybe it is not even going to return partial results?

Here tool_set_by_name returns 2 results from the whole Gitea. Meanwhile grep can return many foir a single repo. Maybe that explains why "tool_set" returns none in some ways?

@gerroon commented on GitHub (Jan 24, 2020): Interesting. Thn maybe it is not even going to return partial results? Here `tool_set_by_name` returns 2 results from the whole Gitea. Meanwhile grep can return many foir a single repo. Maybe that explains why "tool_set" returns none in some ways?
Author
Owner

@guillep2k commented on GitHub (Jan 24, 2020):

It should return result per file where it occurs, as long as it's in master (or whatever branch is your default) and "indexable" (i.e. not filtered out by your settings or ... ehem .... . perhaps your files are marked as executable?). 😳

@guillep2k commented on GitHub (Jan 24, 2020): It _should_ return result per file where it occurs, as long as it's in `master` (or whatever branch is your default) and "indexable" (i.e. not filtered out by your settings or ... ehem .... . _perhaps your files are marked as executable_?). 😳
Author
Owner

@zeripath commented on GitHub (Jan 24, 2020):

as per @guillep2k

@zeripath commented on GitHub (Jan 24, 2020): as per @guillep2k
Author
Owner

@vvrein commented on GitHub (Jan 25, 2020):

May this https://github.com/go-gitea/gitea/issues/9190#issuecomment-571563226 be related to this issue?

@vvrein commented on GitHub (Jan 25, 2020): May this https://github.com/go-gitea/gitea/issues/9190#issuecomment-571563226 be related to this issue?
Author
Owner

@vvrein commented on GitHub (Jan 25, 2020):

Re-checked https://github.com/go-gitea/gitea/issues/9190#issuecomment-571563226 behavior with latest upstream version
1.12.0+dev-174-g5b17bb8f3
Seems working now!
Repo index was updated after git push

@vvrein commented on GitHub (Jan 25, 2020): Re-checked https://github.com/go-gitea/gitea/issues/9190#issuecomment-571563226 behavior with latest upstream version `1.12.0+dev-174-g5b17bb8f3` Seems working now! Repo index was updated after git push
Author
Owner

@zeripath commented on GitHub (Jan 25, 2020):

@vvrein I'm gonna close this as Fixed by #9965 and #9957

@zeripath commented on GitHub (Jan 25, 2020): @vvrein I'm gonna close this as Fixed by #9965 and #9957
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#4707