unexpected slow iops operations #14732

Open
opened 2025-11-02 11:21:29 -06:00 by GiteaMirror · 8 comments
Owner

Originally created by @mosinnik on GitHub (Jul 9, 2025).

Description

I used getea with sqllite on hdd disks and it do some io operations that utilize disks for 30-50% (at busy factor).
In logs I see repeatedly

2025/07/09 19:56:37,stdout,"2025/07/09 16:56:37 .../connect@v1.18.1/handler.go:69:NewUnaryHandler.2() [W] [Slow SQL Query] UPDATE `action_runner` SET `last_online` = ?, `updated` = ? WHERE `id`=? AND (`deleted`=? OR `deleted` IS NULL) [1752080190 1752080190 1 0] - 6.823684431s
2025/07/09 19:56:33,stdout,"2025/07/09 16:56:33 HTTPRequest [W] router: slow      POST /api/actions/runner.v1.RunnerService/FetchTask for 172.25.0.2:49530, elapsed 3628.8ms @ <autogenerated>:1(http.Handler.ServeHTTP-fm)
2025/07/09 19:56:25,stdout,"2025/07/09 16:56:25 HTTPRequest [W] router: slow      POST /api/actions/runner.v1.RunnerService/FetchTask for 172.25.0.2:49530, elapsed 3068.4ms @ <autogenerated>:1(http.Handler.ServeHTTP-fm)
2025/07/09 19:56:25,stdout,"2025/07/09 16:56:24 models/actions/runner.go:258:GetRunnerByUUID() [W] [Slow SQL Query] SELECT `id`, `uuid`, `name`, `version`, `owner_id`, `repo_id`, `description`, `base`, `repo_range`, `token_hash`, `token_salt`, `last_online`, `last_active`, `agent_labels`, `ephemeral`, `created`, `updated`, `deleted` FROM `action_runner` WHERE (uuid=?) AND (`deleted`=? OR `deleted` IS NULL) LIMIT 1 [3166f733-eaaa-4abc-9ecd-534d34af25fc 0] - 7.110539592s
2025/07/09 19:56:25,stdout,"2025/07/09 16:56:24 .../connect@v1.18.1/handler.go:69:NewUnaryHandler.2() [W] [Slow SQL Query] UPDATE `action_runner` SET `last_online` = ?, `updated` = ? WHERE `id`=? AND (`deleted`=? OR `deleted` IS NULL) [1752080176 1752080176 1 0] - 8.618658501s
2025/07/09 19:56:20,stdout,"2025/07/09 16:56:20 HTTPRequest [W] router: slow      POST /api/actions/runner.v1.RunnerService/FetchTask for 172.25.0.2:49528, elapsed 3069.1ms @ <autogenerated>:1(http.Handler.ServeHTTP-fm)
2025/07/09 19:56:18,stdout,"2025/07/09 16:56:16 .../connect@v1.18.1/handler.go:69:NewUnaryHandler.2() [W] [Slow SQL Query] UPDATE `action_runner` SET `last_online` = ?, `updated` = ? WHERE `id`=? AND (`deleted`=? OR `deleted` IS NULL) [1752080170 1752080170 1 0] - 6.187601118s

I'm only one user, have 2 projects one is mirrored from github with 10m intervals and one runner. Server has 2 cores with 32Gb ram and such slow work is unexpected.

So is the way to make less queries to sqllite?

I don't want to set any external dbs just stop do such queries so frequenlty.

Gitea Version

1.24.2

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

No response

Git Version

No response

Operating System

No response

How are you running Gitea?

docker compose

version: "3"

services:
  gitea-server:
    image: gitea/gitea:1.24.2
    container_name: gitea
    environment:
      - USER_UID=1000
      - USER_GID=1000
      - LOCAL_ROOT_URL=http://----:13000/
      - ROOT_URL=http://----:13000/
    restart: always
    volumes:
      - ./server/data:/data
      - ./server/config:/etc/gitea
    ports:
      - "13000:3000"
      - "13022:22"
  
  gitea-runner:
    image: gitea/act_runner:0.2.12
    environment:
      CONFIG_FILE: /config.yaml
      GITEA_INSTANCE_URL: "http://gitea-server:3000/"
      GITEA_RUNNER_REGISTRATION_TOKEN: ----
      GITEA_RUNNER_NAME: "runner_1"
      GITEA_RUNNER_LABELS: "runner_1"
    volumes:
      - ./runner/config.yaml:/config.yaml
      - ./runner/data:/data
      - /var/run/docker.sock:/var/run/docker.sock

Database

SQLite

Originally created by @mosinnik on GitHub (Jul 9, 2025). ### Description I used getea with sqllite on hdd disks and it do some io operations that utilize disks for 30-50% (at busy factor). In logs I see repeatedly ``` 2025/07/09 19:56:37,stdout,"2025/07/09 16:56:37 .../connect@v1.18.1/handler.go:69:NewUnaryHandler.2() [W] [Slow SQL Query] UPDATE `action_runner` SET `last_online` = ?, `updated` = ? WHERE `id`=? AND (`deleted`=? OR `deleted` IS NULL) [1752080190 1752080190 1 0] - 6.823684431s 2025/07/09 19:56:33,stdout,"2025/07/09 16:56:33 HTTPRequest [W] router: slow POST /api/actions/runner.v1.RunnerService/FetchTask for 172.25.0.2:49530, elapsed 3628.8ms @ <autogenerated>:1(http.Handler.ServeHTTP-fm) 2025/07/09 19:56:25,stdout,"2025/07/09 16:56:25 HTTPRequest [W] router: slow POST /api/actions/runner.v1.RunnerService/FetchTask for 172.25.0.2:49530, elapsed 3068.4ms @ <autogenerated>:1(http.Handler.ServeHTTP-fm) 2025/07/09 19:56:25,stdout,"2025/07/09 16:56:24 models/actions/runner.go:258:GetRunnerByUUID() [W] [Slow SQL Query] SELECT `id`, `uuid`, `name`, `version`, `owner_id`, `repo_id`, `description`, `base`, `repo_range`, `token_hash`, `token_salt`, `last_online`, `last_active`, `agent_labels`, `ephemeral`, `created`, `updated`, `deleted` FROM `action_runner` WHERE (uuid=?) AND (`deleted`=? OR `deleted` IS NULL) LIMIT 1 [3166f733-eaaa-4abc-9ecd-534d34af25fc 0] - 7.110539592s 2025/07/09 19:56:25,stdout,"2025/07/09 16:56:24 .../connect@v1.18.1/handler.go:69:NewUnaryHandler.2() [W] [Slow SQL Query] UPDATE `action_runner` SET `last_online` = ?, `updated` = ? WHERE `id`=? AND (`deleted`=? OR `deleted` IS NULL) [1752080176 1752080176 1 0] - 8.618658501s 2025/07/09 19:56:20,stdout,"2025/07/09 16:56:20 HTTPRequest [W] router: slow POST /api/actions/runner.v1.RunnerService/FetchTask for 172.25.0.2:49528, elapsed 3069.1ms @ <autogenerated>:1(http.Handler.ServeHTTP-fm) 2025/07/09 19:56:18,stdout,"2025/07/09 16:56:16 .../connect@v1.18.1/handler.go:69:NewUnaryHandler.2() [W] [Slow SQL Query] UPDATE `action_runner` SET `last_online` = ?, `updated` = ? WHERE `id`=? AND (`deleted`=? OR `deleted` IS NULL) [1752080170 1752080170 1 0] - 6.187601118s ``` I'm only one user, have 2 projects one is mirrored from github with 10m intervals and one runner. Server has 2 cores with 32Gb ram and such slow work is unexpected. So is the way to make less queries to sqllite? I don't want to set any external dbs just stop do such queries so frequenlty. ### Gitea Version 1.24.2 ### Can you reproduce the bug on the Gitea demo site? No ### Log Gist _No response_ ### Screenshots _No response_ ### Git Version _No response_ ### Operating System _No response_ ### How are you running Gitea? docker compose ``` version: "3" services: gitea-server: image: gitea/gitea:1.24.2 container_name: gitea environment: - USER_UID=1000 - USER_GID=1000 - LOCAL_ROOT_URL=http://----:13000/ - ROOT_URL=http://----:13000/ restart: always volumes: - ./server/data:/data - ./server/config:/etc/gitea ports: - "13000:3000" - "13022:22" gitea-runner: image: gitea/act_runner:0.2.12 environment: CONFIG_FILE: /config.yaml GITEA_INSTANCE_URL: "http://gitea-server:3000/" GITEA_RUNNER_REGISTRATION_TOKEN: ---- GITEA_RUNNER_NAME: "runner_1" GITEA_RUNNER_LABELS: "runner_1" volumes: - ./runner/config.yaml:/config.yaml - ./runner/data:/data - /var/run/docker.sock:/var/run/docker.sock ``` ### Database SQLite
GiteaMirror added the type/bug label 2025-11-02 11:21:29 -06:00
Author
Owner

@lunny commented on GitHub (Jul 9, 2025):

Is there any other slow SQLs rather than actions related SQL?

@lunny commented on GitHub (Jul 9, 2025): Is there any other slow SQLs rather than actions related SQL?
Author
Owner

@mosinnik commented on GitHub (Jul 9, 2025):

@lunny as I see each SQL is slow, added full log:

gitea.zip

@mosinnik commented on GitHub (Jul 9, 2025): @lunny as I see each SQL is slow, added full log: [gitea.zip](https://github.com/user-attachments/files/21148620/gitea.zip)
Author
Owner

@mosinnik commented on GitHub (Jul 9, 2025):

I know that main problem is slow disks for randomwrite but any writes not expected when nothing real happen on getea

@mosinnik commented on GitHub (Jul 9, 2025): I know that main problem is slow disks for randomwrite but any writes not expected when nothing real happen on getea
Author
Owner

@ChristopherHX commented on GitHub (Jul 9, 2025):

So is the way to make less queries to sqllite?

Yes autoscaler waiting externally for workflow_job webhook to run an act_runner for a single job, e.g. zero idle act_runners polling your db.

This would be event based. Once you get an http request start the runner for one job, poll once then execute the job then exit. Can be a very small program.

On a raspberry pi4 I had also problems with actions polling, once I had 100 runners polling almost at the same second.

Jobs started to no longer pick up reliable, stronger hardware has not been affected.

*Once I fixed picking up, gitea became usable due to active jobs increasing db pressure as well.


However I do not currently have any examples that do this with docker, most with containers seem to prefer to be kubernetes or other technologies like linux containers (lxd, incus)

I made a table of k8s autoscalers in https://github.com/go-gitea/gitea/issues/29567

garm can scale other envs too, but not sure about docker containers.

@ChristopherHX commented on GitHub (Jul 9, 2025): > So is the way to make less queries to sqllite? Yes autoscaler waiting externally for workflow_job webhook to run an act_runner for a single job, e.g. zero idle act_runners polling your db. This would be event based. Once you get an http request start the runner for one job, poll once then execute the job then exit. Can be a very small program. On a raspberry pi4 I had also problems with actions polling, once I had 100 runners polling almost at the same second. Jobs started to no longer pick up reliable, stronger hardware has not been affected. *Once I fixed picking up, gitea became usable due to active jobs increasing db pressure as well. --- However I do not currently have any examples that do this with docker, most with containers seem to prefer to be kubernetes or other technologies like linux containers (lxd, incus) I made a table of k8s autoscalers in <https://github.com/go-gitea/gitea/issues/29567> garm can scale other envs too, but not sure about docker containers.
Author
Owner

@lunny commented on GitHub (Jul 9, 2025):

I know that main problem is slow disks for randomwrite but any writes not expected when nothing real happen on getea

The disk I/O appears to be quite slow. Could you try writing a simple demo application to test the SQLite database query performance on that disk?

@lunny commented on GitHub (Jul 9, 2025): > I know that main problem is slow disks for randomwrite but any writes not expected when nothing real happen on getea The disk I/O appears to be quite slow. Could you try writing a simple demo application to test the SQLite database query performance on that disk?
Author
Owner

@mosinnik commented on GitHub (Jul 9, 2025):

@lunny it's a joke? gitea is what u need - it is already use sqllite, and it is slow, just setup slow disks for writing (hdd 5400 and do some parallel work like fio randomwrite to make disks more busy for tests). Add one idle act_runner, may be add some projects and workflow I dont know why gitea do what it do and can't help with it

@mosinnik commented on GitHub (Jul 9, 2025): @lunny it's a joke? gitea is what u need - it is already use sqllite, and it is slow, just setup slow disks for writing (hdd 5400 and do some parallel work like fio randomwrite to make disks more busy for tests). Add one idle act_runner, may be add some projects and workflow I dont know why gitea do what it do and can't help with it
Author
Owner

@mosinnik commented on GitHub (Jul 9, 2025):

but not sure about docker containers.

it's not about docker/k8s/lsx/etc - it's about how sqllite used and about a lot of non needed usage of sqllite. For example - why do gitea need to check act_runners statuses even no act need to be execute and why it must be STORED to disk via sqllite? why it is not an inmemory state? and why just don't check them before launch action or at least give users option to make check in that way and not each N second?

@mosinnik commented on GitHub (Jul 9, 2025): > but not sure about docker containers. it's not about docker/k8s/lsx/etc - it's about how sqllite used and about a lot of non needed usage of sqllite. For example - why do gitea need to check act_runners statuses even no act need to be execute and why it must be STORED to disk via sqllite? why it is not an inmemory state? and why just don't check them before launch action or at least give users option to make check in that way and not each N second?
Author
Owner

@ChristopherHX commented on GitHub (Jul 10, 2025):

why it is not an inmemory state?

I was not part of this decision.


My independent "GitHub Actions Emulator" uses inmemory queues, but does not have repo scoping, org and global runners where the amount of inmemory queues are exponentally growing on systems with more runners.


The fast polling rate of every 5s (can be changed on act_runner config) is not what I expected.

I am used to long polling, keep up to 50s connected to the runner. This allow much more inmemory magic.


I do not currently have any concept to replace the Gitea Implementation here, so I just pointed to what I know to bypass this problem

Maybe this also raised the bar for me to investigate more into the database based queue, since there are more important things to do in my point of view like:

  • concurrency
  • reusable workflows on Gitea Server Side instead of act_runner
@ChristopherHX commented on GitHub (Jul 10, 2025): > why it is not an inmemory state? I was not part of this decision. --- My independent "GitHub Actions Emulator" uses inmemory queues, but does not have repo scoping, org and global runners where the amount of inmemory queues are exponentally growing on systems with more runners. --- The **fast** polling rate of every 5s (can be changed on act_runner config) is not what I expected. I am used to long polling, keep up to 50s connected to the runner. This allow much more inmemory magic. --- _I do not currently have any concept to replace the Gitea Implementation here, so I just pointed to what I know to bypass this problem_ Maybe this also raised the bar for me to investigate more into the database based queue, since there are more important things to do in my point of view like: - concurrency - reusable workflows on Gitea Server Side instead of act_runner
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#14732