New batch migration causes error on sqlite #3533

Closed
opened 2025-11-02 05:16:06 -06:00 by GiteaMirror · 8 comments
Owner

Originally created by @mrsdizzie on GitHub (Jul 3, 2019).

(cc @lunny)

With the changes in #7050, I now get these when testing a migration:

2019/07/03 11:52:30 routers/repo/repo.go:315:MigratePost() [E] MigratePost: too many SQL variables

My example happened for comments, based on this INSERT statement (I think):

b5aa7f7ceb/models/migrate.go (L121-L127)

Here is the full gist:

https://gist.github.com/mrsdizzie/681ea0295c11350fea4244a4289665ef

This is just testing migrating the "tea" repo here: https://github.com/go-gitea/tea

But it could maybe happen other places too for similar reasons. For each comment there would be a few dozen variables in that SQL statement, so it breaks depending how many comments there are. Each comment adds about 22 variables to the statement now (one for each column), so even something like 50 total comments (say 10 issues with 5 comments each) is enough to trigger this error since I believe the default SQLITE_MAX_VARIABLE_NUMBER is 999 (22 * 50 = 1,100).

I know this is sort of a limit of sqlite and probably wouldn't happen with others -- but still an issue (and makes testing new migration features difficult since it is nice to use sqlite locally for development).

Maybe we can detect if sqlite and use the old method (or limit it to a known good number like 25 comments at a time)?

Originally created by @mrsdizzie on GitHub (Jul 3, 2019). (cc @lunny) With the changes in #7050, I now get these when testing a migration: ```2019/07/03 11:52:30 routers/repo/repo.go:315:MigratePost() [E] MigratePost: too many SQL variables``` My example happened for comments, based on this INSERT statement (I think): https://github.com/go-gitea/gitea/blob/b5aa7f7ceb4ca828f50395e404e19c5ba7679268/models/migrate.go#L121-L127 Here is the full gist: https://gist.github.com/mrsdizzie/681ea0295c11350fea4244a4289665ef This is just testing migrating the "tea" repo here: https://github.com/go-gitea/tea But it could maybe happen other places too for similar reasons. For each comment there would be a few dozen variables in that SQL statement, so it breaks depending how many comments there are. Each comment adds about 22 variables to the statement now (one for each column), so even something like 50 total comments (say 10 issues with 5 comments each) is enough to trigger this error since I believe the default SQLITE_MAX_VARIABLE_NUMBER is 999 (22 * 50 = 1,100). I know this is sort of a limit of sqlite and probably wouldn't happen with others -- but still an issue (and makes testing new migration features difficult since it is nice to use sqlite locally for development). Maybe we can detect if sqlite and use the old method (or limit it to a known good number like 25 comments at a time)?
GiteaMirror added the type/bug label 2025-11-02 05:16:06 -06:00
Author
Owner

@lunny commented on GitHub (Jul 4, 2019):

The batch size of insert comments is 100 comments. Maybe I should move it to 50.

@lunny commented on GitHub (Jul 4, 2019): The batch size of insert comments is 100 comments. Maybe I should move it to 50.
Author
Owner

@lunny commented on GitHub (Jul 4, 2019):

And I tested locally, it seems it's OK to migrate github.com/go-gitea/tea with sqlite database. I'm on MacOS.

@lunny commented on GitHub (Jul 4, 2019): And I tested locally, it seems it's OK to migrate github.com/go-gitea/tea with sqlite database. I'm on MacOS.
Author
Owner

@lunny commented on GitHub (Jul 4, 2019):

@mrsdizzie could you confirm #7353 can fix your issue. You need to change the default SAVE_BATCH_SIZE.

@lunny commented on GitHub (Jul 4, 2019): @mrsdizzie could you confirm #7353 can fix your issue. You need to change the default SAVE_BATCH_SIZE.
Author
Owner

@zeripath commented on GitHub (Jul 4, 2019):

We should set the batch size to something that works for sqlite out of the box as that's the default db

@zeripath commented on GitHub (Jul 4, 2019): We should set the batch size to something that works for sqlite out of the box as that's the default db
Author
Owner

@mrsdizzie commented on GitHub (Jul 5, 2019):

I'm not the most knowledgable on all of this, but from some reading of the error I think the gist is that for each column in a row, there with be a ? variable in the SQL statement. The more columns, the more ? and then it grows exponentially as each row inserted will have another set of them (see the gist I posted above). Once there are more than 999 ?sqlite will throw an error.

@lunny the test for tea might work for you and fail for me because I was testing a PR that adds two more columns to the comment table, so there are more variables for each comment Inserted and it hits the Sqlite limit faster. In My example above there are 1035 ? variables from 45 imported comments. If you remove the two columns I added for my PR, there would only be 945 (1035 - 90) and it would not hit the error.

I think the real issue here is that it isn't about the number of rows you are trying to insert but how many columns each of those rows has. Inserting 100 rows at a time will work fine for a table that has a few columns but will have trouble for a larger table like comments which as of my PR now has 23 columns.

While setting a Global limit could help in my situation as I could lower the number, it would involve guessing for most users since the problem isn't based on the number of rows inserted but how many columns are in those rows * the number of rows. I think it would be better in the code if we know that the comments table has x number of columns that we don't insert more rows than it can handle if possible.

If not the global limit would probably need to be based on the table with the largest number of columns that can have a lot of rows imported. In the case from my example, the limit would have to be 43 (23 * 42 = 989). And then it would need to be lower if another column was added to comments table. I'm not aware if there are larger tables that let you add lots of rows too, but if so it would need to be lower.

Alternatively, setting the default for Sqlite to something lower like 25 would probably avoid getting close to those limits without worrying about it breaking on an update when somebody adds a column.

Sorry for long response was just trying to work all of that out in my head.

@mrsdizzie commented on GitHub (Jul 5, 2019): I'm not the most knowledgable on all of this, but from some reading of the error I think the gist is that for each column in a row, there with be a ```?``` variable in the SQL statement. The more columns, the more ```?``` and then it grows exponentially as each row inserted will have another set of them (see the gist I posted above). Once there are more than 999 ```?```sqlite will throw an error. @lunny the test for tea might work for you and fail for me because I was testing a PR that adds two more columns to the comment table, so there are more variables for each comment Inserted and it hits the Sqlite limit faster. In My example above there are 1035 ```?``` variables from 45 imported comments. If you remove the two columns I added for my PR, there would only be 945 (1035 - 90) and it would not hit the error. I think the real issue here is that it isn't about the number of rows you are trying to insert but how many columns each of those rows has. Inserting 100 rows at a time will work fine for a table that has a few columns but will have trouble for a larger table like comments which as of my PR now has 23 columns. While setting a Global limit could help in my situation as I could lower the number, it would involve guessing for most users since the problem isn't based on the number of rows inserted but how many columns are in those rows * the number of rows. I think it would be better in the code if we know that the comments table has x number of columns that we don't insert more rows than it can handle if possible. If not the global limit would probably need to be based on the table with the largest number of columns that can have a lot of rows imported. In the case from my example, the limit would have to be 43 (23 * 42 = 989). And then it would need to be lower if another column was added to comments table. I'm not aware if there are larger tables that let you add lots of rows too, but if so it would need to be lower. Alternatively, setting the default for Sqlite to something lower like 25 would probably avoid getting close to those limits without worrying about it breaking on an update when somebody adds a column. Sorry for long response was just trying to work all of that out in my head.
Author
Owner

@lunny commented on GitHub (Jul 6, 2019):

In fact, we know the columns length of issue and comment tables.

@lunny commented on GitHub (Jul 6, 2019): In fact, we know the columns length of `issue` and `comment` tables.
Author
Owner

@mrsdizzie commented on GitHub (Jul 6, 2019):

@lunny then I think we should add code where it does the insert not to use more than the number of rows that will cause an error (where rows * columns < 999). It could be sqlite conditional too. If there are no other places where it would try to do large inserts at once then maybe just adding that code to migrations will be good to fix this.

@mrsdizzie commented on GitHub (Jul 6, 2019): @lunny then I think we should add code where it does the insert not to use more than the number of rows that will cause an error (where rows * columns < 999). It could be sqlite conditional too. If there are no other places where it would try to do large inserts at once then maybe just adding that code to migrations will be good to fix this.
Author
Owner

@lunny commented on GitHub (Jul 6, 2019):

@mrsdizzie I updated #7353

@lunny commented on GitHub (Jul 6, 2019): @mrsdizzie I updated #7353
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/gitea#3533