mirror of
https://github.com/go-vikunja/vikunja.git
synced 2026-03-09 07:13:35 -05:00
docs: add wip plans
This commit is contained in:
240
plans/fix-s3-unsigned-payload.md
Normal file
240
plans/fix-s3-unsigned-payload.md
Normal file
@@ -0,0 +1,240 @@
|
||||
# Fix S3 XAmzContentSHA256Mismatch for S3-Compatible Stores
|
||||
|
||||
**Goal:** Add a configuration option to disable S3 payload signing (use `UNSIGNED-PAYLOAD`), fixing uploads to S3-compatible stores like Cellar/Ceph that fail with `XAmzContentSHA256Mismatch`.
|
||||
|
||||
**Architecture:** The AWS SDK Go v2 computes a SHA256 hash of the request body and sends it in the `X-Amz-Content-Sha256` header for SigV4 signing. Some S3-compatible stores (Ceph RadosGW, Cellar, etc.) don't verify this correctly, causing `XAmzContentSHA256Mismatch` errors. The SDK provides `v4.SwapComputePayloadSHA256ForUnsignedPayloadMiddleware` to replace the hash with `UNSIGNED-PAYLOAD`, which is safe over HTTPS. We add a `files.s3.disablesigning` config option that applies this middleware when creating the S3 client.
|
||||
|
||||
**Tech Stack:** Go, AWS SDK Go v2 (`aws/signer/v4`), viper config
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Add the config key
|
||||
|
||||
**Files:**
|
||||
- Modify: `pkg/config/config.go:170` (add key constant)
|
||||
- Modify: `pkg/config/config.go:448` (add default)
|
||||
|
||||
**Step 1: Add the config key constant**
|
||||
|
||||
Add after `FilesS3UsePathStyle` (line 170) in `pkg/config/config.go`:
|
||||
|
||||
```go
|
||||
FilesS3UsePathStyle Key = `files.s3.usepathstyle`
|
||||
FilesS3DisableSigning Key = `files.s3.disablesigning`
|
||||
FilesS3TempDir Key = `files.s3.tempdir`
|
||||
```
|
||||
|
||||
**Step 2: Add the default value**
|
||||
|
||||
Add after `FilesS3UsePathStyle.setDefault(false)` (line 448) in `pkg/config/config.go`:
|
||||
|
||||
```go
|
||||
FilesS3UsePathStyle.setDefault(false)
|
||||
FilesS3DisableSigning.setDefault(false)
|
||||
FilesS3TempDir.setDefault("")
|
||||
```
|
||||
|
||||
**Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add pkg/config/config.go
|
||||
git commit -m "feat: add files.s3.disablesigning config key"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 2: Add config to `config-raw.json`
|
||||
|
||||
This file generates `config.yml.sample` and documents the option for users.
|
||||
|
||||
**Files:**
|
||||
- Modify: `config-raw.json` (add entry after `usepathstyle` in the `s3` children array, around line 531)
|
||||
|
||||
**Step 1: Add the config entry**
|
||||
|
||||
Insert after the `usepathstyle` entry (line 527-531) in `config-raw.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"key": "disablesigning",
|
||||
"default_value": "false",
|
||||
"comment": "When enabled, the S3 client will send UNSIGNED-PAYLOAD instead of computing a SHA256 hash for request signing. Some S3-compatible providers (such as Ceph RadosGW, Clever Cloud Cellar) do not correctly verify payload signatures and return XAmzContentSHA256Mismatch errors. Enabling this option works around the issue. Only applies over HTTPS."
|
||||
}
|
||||
```
|
||||
|
||||
**Step 2: Commit**
|
||||
|
||||
```bash
|
||||
git add config-raw.json
|
||||
git commit -m "docs: add files.s3.disablesigning to config template"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 3: Write the failing test
|
||||
|
||||
**Files:**
|
||||
- Modify: `pkg/files/s3_test.go` (add test at the end)
|
||||
|
||||
**Step 1: Write the test**
|
||||
|
||||
Add to the end of `pkg/files/s3_test.go`:
|
||||
|
||||
```go
|
||||
func TestInitS3FileHandler_DisableSigningOption(t *testing.T) {
|
||||
// Save original config
|
||||
originalType := config.FilesType.GetString()
|
||||
originalEndpoint := config.FilesS3Endpoint.GetString()
|
||||
originalBucket := config.FilesS3Bucket.GetString()
|
||||
originalAccessKey := config.FilesS3AccessKey.GetString()
|
||||
originalSecretKey := config.FilesS3SecretKey.GetString()
|
||||
originalDisableSigning := config.FilesS3DisableSigning.GetBool()
|
||||
originalClient := s3Client
|
||||
|
||||
defer func() {
|
||||
config.FilesType.Set(originalType)
|
||||
config.FilesS3Endpoint.Set(originalEndpoint)
|
||||
config.FilesS3Bucket.Set(originalBucket)
|
||||
config.FilesS3AccessKey.Set(originalAccessKey)
|
||||
config.FilesS3SecretKey.Set(originalSecretKey)
|
||||
config.FilesS3DisableSigning.Set(originalDisableSigning)
|
||||
s3Client = originalClient
|
||||
}()
|
||||
|
||||
config.FilesS3Endpoint.Set("https://fake-endpoint.example.com")
|
||||
config.FilesS3Bucket.Set("test-bucket")
|
||||
config.FilesS3AccessKey.Set("test-key")
|
||||
config.FilesS3SecretKey.Set("test-secret")
|
||||
config.FilesS3DisableSigning.Set(true)
|
||||
|
||||
// initS3FileHandler should succeed (validation will fail because the
|
||||
// endpoint is fake, but that happens later in InitFileHandler).
|
||||
// We call initS3FileHandler directly to test the client is created.
|
||||
err := initS3FileHandler()
|
||||
// The error will be from ValidateFileStorage or nil -- either way,
|
||||
// initS3FileHandler itself should not error for the signing option.
|
||||
// We can't easily verify the middleware was applied without a real
|
||||
// request, so we verify no panic or init error.
|
||||
assert.NoError(t, err)
|
||||
assert.NotNil(t, s3Client)
|
||||
}
|
||||
```
|
||||
|
||||
**Step 2: Run the test to verify it fails**
|
||||
|
||||
Run: `mage test:filter TestInitS3FileHandler_DisableSigningOption`
|
||||
Expected: FAIL -- `config.FilesS3DisableSigning` does not exist yet if Task 1 isn't done, or passes if Task 1 is already done (in which case the test serves as a regression guard).
|
||||
|
||||
**Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add pkg/files/s3_test.go
|
||||
git commit -m "test: add test for S3 disable signing config option"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 4: Apply the middleware in `initS3FileHandler`
|
||||
|
||||
**Files:**
|
||||
- Modify: `pkg/files/filehandling.go:96-100` (add middleware option)
|
||||
- Modify: `pkg/files/filehandling.go` imports (add `v4` signer import)
|
||||
|
||||
**Step 1: Add the v4 signer import**
|
||||
|
||||
Add to the imports in `pkg/files/filehandling.go`:
|
||||
|
||||
```go
|
||||
v4 "github.com/aws/aws-sdk-go-v2/aws/signer/v4"
|
||||
```
|
||||
|
||||
**Step 2: Apply the middleware when config is enabled**
|
||||
|
||||
Replace the S3 client creation block (lines 97-100) in `pkg/files/filehandling.go`:
|
||||
|
||||
```go
|
||||
// Before:
|
||||
client := s3.NewFromConfig(cfg, func(o *s3.Options) {
|
||||
o.BaseEndpoint = aws.String(endpoint)
|
||||
o.UsePathStyle = config.FilesS3UsePathStyle.GetBool()
|
||||
})
|
||||
|
||||
// After:
|
||||
client := s3.NewFromConfig(cfg, func(o *s3.Options) {
|
||||
o.BaseEndpoint = aws.String(endpoint)
|
||||
o.UsePathStyle = config.FilesS3UsePathStyle.GetBool()
|
||||
if config.FilesS3DisableSigning.GetBool() {
|
||||
o.APIOptions = append(o.APIOptions, v4.SwapComputePayloadSHA256ForUnsignedPayloadMiddleware)
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
This replaces the `ComputePayloadSHA256` middleware with `UnsignedPayload` for all requests made by this client. Instead of computing the body's SHA256 and setting it in `X-Amz-Content-Sha256`, the client sends `UNSIGNED-PAYLOAD`. This is safe over HTTPS since TLS provides integrity guarantees.
|
||||
|
||||
**Step 3: Run the test**
|
||||
|
||||
Run: `mage test:filter TestInitS3FileHandler_DisableSigningOption`
|
||||
Expected: PASS
|
||||
|
||||
**Step 4: Run the full file test suite**
|
||||
|
||||
Run: `mage test:filter TestFileSave_S3`
|
||||
Expected: PASS (existing tests unaffected since the config defaults to false)
|
||||
|
||||
**Step 5: Run lint**
|
||||
|
||||
Run: `mage lint`
|
||||
Expected: PASS
|
||||
|
||||
**Step 6: Commit**
|
||||
|
||||
```bash
|
||||
git add pkg/files/filehandling.go
|
||||
git commit -m "feat: support UNSIGNED-PAYLOAD for S3-compatible stores
|
||||
|
||||
Add files.s3.disablesigning config option that replaces the SHA256 payload
|
||||
hash with UNSIGNED-PAYLOAD in S3 requests. This fixes XAmzContentSHA256Mismatch
|
||||
errors with S3-compatible providers like Ceph RadosGW and Clever Cloud Cellar."
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 5: Run the config init tests
|
||||
|
||||
**Files:** (no changes, verification only)
|
||||
|
||||
**Step 1: Run all S3 config tests**
|
||||
|
||||
Run: `mage test:filter TestInitFileHandler_S3`
|
||||
Expected: PASS -- existing config validation tests should still pass since `disablesigning` is only read, not validated as required.
|
||||
|
||||
**Step 2: Run full lint**
|
||||
|
||||
Run: `mage lint`
|
||||
Expected: PASS
|
||||
|
||||
---
|
||||
|
||||
### Summary of changes
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `pkg/config/config.go` | Add `FilesS3DisableSigning` key constant and `false` default |
|
||||
| `config-raw.json` | Add `disablesigning` entry with documentation comment |
|
||||
| `pkg/files/filehandling.go` | Import `v4` signer, apply `SwapComputePayloadSHA256ForUnsignedPayloadMiddleware` when config is true |
|
||||
| `pkg/files/s3_test.go` | Add `TestInitS3FileHandler_DisableSigningOption` test |
|
||||
|
||||
### User-facing documentation
|
||||
|
||||
Users who encounter `XAmzContentSHA256Mismatch` errors should add to their `config.yml`:
|
||||
|
||||
```yaml
|
||||
files:
|
||||
type: s3
|
||||
s3:
|
||||
endpoint: "https://..."
|
||||
bucket: "..."
|
||||
accesskey: "..."
|
||||
secretkey: "..."
|
||||
disablesigning: true
|
||||
```
|
||||
121
plans/fix-s3-upload-error-handling.md
Normal file
121
plans/fix-s3-upload-error-handling.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# Fix Frontend S3 Upload Error Handling Implementation Plan
|
||||
|
||||
**Goal:** Show meaningful error notifications when file attachment uploads fail, instead of silently swallowing errors.
|
||||
|
||||
**Architecture:** The backend already returns `{errors: [...], success: [...]}` in the response body (HTTP 200) for batch upload semantics. The frontend helper `uploadFiles()` detects errors and throws, but the component caller never awaits or catches. Fix the caller to handle errors, and fix the error formatting to produce readable messages.
|
||||
|
||||
**Tech Stack:** Vue 3 + TypeScript, vue-i18n
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Fix error message formatting in `uploadFiles()`
|
||||
|
||||
The current code `throw Error(response.errors)` passes an array of `{code, message}` objects to `Error()`, which coerces them to `"[object Object]"`. Instead, extract the message strings and join them.
|
||||
|
||||
**Files:**
|
||||
- Modify: `frontend/src/helpers/attachments.ts:32-34`
|
||||
|
||||
**Step 1: Fix the error throw to produce a readable message**
|
||||
|
||||
Replace lines 32-34 in `frontend/src/helpers/attachments.ts`:
|
||||
|
||||
```ts
|
||||
// Before:
|
||||
if (response.errors !== null) {
|
||||
throw Error(response.errors)
|
||||
}
|
||||
|
||||
// After:
|
||||
if (response.errors !== null) {
|
||||
const messages = response.errors.map((e: {message: string}) => e.message)
|
||||
throw new Error(messages.join('\n'))
|
||||
}
|
||||
```
|
||||
|
||||
**Step 2: Run frontend lint to verify**
|
||||
|
||||
Run: `cd frontend && pnpm lint`
|
||||
Expected: PASS (no new lint errors)
|
||||
|
||||
**Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add frontend/src/helpers/attachments.ts
|
||||
git commit -m "fix: format attachment upload error messages as readable strings"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 2: Add error handling in `Attachments.vue` upload caller
|
||||
|
||||
The function `uploadFilesToTask()` at line 336-338 calls `uploadFiles()` without `await` or error handling. The promise rejection goes unhandled. Add `try/catch` with the existing `error()` notification function (already imported at line 192).
|
||||
|
||||
**Files:**
|
||||
- Modify: `frontend/src/components/tasks/partials/Attachments.vue:336-338`
|
||||
|
||||
**Step 1: Add try/catch to `uploadFilesToTask`**
|
||||
|
||||
Replace lines 336-338:
|
||||
|
||||
```ts
|
||||
// Before:
|
||||
function uploadFilesToTask(files: File[] | FileList) {
|
||||
uploadFiles(attachmentService, props.task.id, files)
|
||||
}
|
||||
|
||||
// After:
|
||||
async function uploadFilesToTask(files: File[] | FileList) {
|
||||
try {
|
||||
await uploadFiles(attachmentService, props.task.id, files)
|
||||
} catch (e) {
|
||||
error(e)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This mirrors the existing pattern used in `deleteAttachment()` at line 346-358 in the same file. The `error` import from `@/message` is already present at line 192.
|
||||
|
||||
**Step 2: Run frontend lint to verify**
|
||||
|
||||
Run: `cd frontend && pnpm lint`
|
||||
Expected: PASS
|
||||
|
||||
**Step 3: Run frontend type check**
|
||||
|
||||
Run: `cd frontend && pnpm typecheck`
|
||||
Expected: PASS
|
||||
|
||||
**Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add frontend/src/components/tasks/partials/Attachments.vue
|
||||
git commit -m "fix: handle attachment upload errors with user-visible notifications"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 3: Manual verification
|
||||
|
||||
There are no existing unit tests for `uploadFilesToTask` or `uploadFiles`. Verify manually:
|
||||
|
||||
**Step 1: Verify error display with backend returning errors**
|
||||
|
||||
1. Start the dev server: `cd frontend && pnpm dev`
|
||||
2. Upload a file to a task
|
||||
3. If using S3 with a misconfigured endpoint, the error should now show as a toast notification at the bottom left
|
||||
4. If using local storage, simulate by temporarily making the files directory read-only, then uploading
|
||||
|
||||
**Step 2: Verify partial success still works**
|
||||
|
||||
1. Upload multiple files where one exceeds the size limit
|
||||
2. The successful files should still appear as attachments
|
||||
3. The failed file(s) should show an error toast with the message from the backend
|
||||
|
||||
---
|
||||
|
||||
### Summary of changes
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `frontend/src/helpers/attachments.ts:32-34` | Format error array into readable message string |
|
||||
| `frontend/src/components/tasks/partials/Attachments.vue:336-338` | Add `async`, `await`, `try/catch` with `error()` notification |
|
||||
1008
plans/impl-task-index-counter-table.md
Normal file
1008
plans/impl-task-index-counter-table.md
Normal file
File diff suppressed because it is too large
Load Diff
255
plans/investigation-migration-bugs.md
Normal file
255
plans/investigation-migration-bugs.md
Normal file
@@ -0,0 +1,255 @@
|
||||
# Investigation: Database Migrations Recorded But Not Applied
|
||||
|
||||
**Issue Reference:** https://github.com/go-vikunja/vikunja/issues/2172
|
||||
**Date:** 2026-02-04
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Users report that some database migrations appear in the `migration` table but their effects are not present in the database. Specifically:
|
||||
|
||||
1. **Reporter 1 (SamBouwer):** Duplicate entries in migrations table — all migrations appear twice, with two `SCHEMA_INIT` rows. The `bucket_configuration` column contains old-format string filters instead of the expected JSON object format.
|
||||
|
||||
2. **Reporter 2 (freeware-superman):** All migrations after `20240919130957` are completely missing from the migrations table (roughly 11 migrations), despite being known to the binary. No duplicate entries.
|
||||
|
||||
---
|
||||
|
||||
## Root Causes Identified
|
||||
|
||||
Three distinct problems were found:
|
||||
|
||||
### Problem 1: Race Condition — No Locking, No Unique Constraint
|
||||
|
||||
**Location:** `xormigrate@v1.7.1/xormigrate.go:287-308` and `:358-370`
|
||||
|
||||
The `Migration` struct definition:
|
||||
|
||||
```go
|
||||
type Migration struct {
|
||||
ID string `xorm:"id"` // Just a column name, NOT a primary key!
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
The tag `xorm:"id"` is **only a column name mapping**, not a primary key declaration. XORM only auto-detects primary keys for untagged `int64` fields named `ID`. The `migration` table has **no UNIQUE constraint and no PRIMARY KEY** on the `id` column.
|
||||
|
||||
Combined with zero locking in the migration process:
|
||||
|
||||
1. `canInitializeSchema()` checks `COUNT(*) == 0` on the migrations table
|
||||
2. `runInitSchema()` inserts `SCHEMA_INIT` + all migration IDs
|
||||
3. There is no advisory lock, no `SELECT FOR UPDATE`, no mutex between steps 1 and 2
|
||||
|
||||
**Race scenario:**
|
||||
|
||||
If two Vikunja instances start concurrently (Docker restart, Kubernetes scaling, systemd `Restart=always`):
|
||||
|
||||
1. Instance A creates migration table, sees it's empty
|
||||
2. Instance B sees table exists, also sees it's empty (A hasn't inserted yet)
|
||||
3. Instance A runs `initSchema`, inserts all migration IDs
|
||||
4. Instance B runs `initSchema`, inserts all migration IDs again
|
||||
|
||||
Result: Every migration ID appears twice, including two `SCHEMA_INIT` rows.
|
||||
|
||||
**This explains Reporter 1's duplicate entries.**
|
||||
|
||||
---
|
||||
|
||||
### Problem 2: `initSchema` Records Migrations Without Executing Them
|
||||
|
||||
**Location:** `xormigrate@v1.7.1/xormigrate.go:287-308`
|
||||
|
||||
The `runInitSchema` function:
|
||||
|
||||
```go
|
||||
func (x *Xormigrate) runInitSchema() error {
|
||||
// 1. Run initSchema - creates tables via Sync2 (DDL only)
|
||||
if err := x.initSchema(x.db); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// 2. Insert SCHEMA_INIT marker
|
||||
if err := x.insertMigration(initSchemaMigrationId); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// 3. Insert ALL migration IDs WITHOUT executing migration functions
|
||||
for _, migration := range x.migrations {
|
||||
if err := x.insertMigration(migration.ID); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
```
|
||||
|
||||
This is **by design** for fresh databases (no data to migrate). But it causes problems when:
|
||||
|
||||
- The migrations table gets corrupted/emptied while data still exists
|
||||
- Concurrent starts cause the race condition above
|
||||
|
||||
In the race scenario, if Instance A completes `initSchema` and inserts all IDs, then Instance B's `canInitializeSchema()` check sees `count > 0` and falls through to the per-migration loop. There, `migrationDidRun()` returns `true` for every migration (IDs exist from Instance A), so **no migration function ever runs**.
|
||||
|
||||
But `initSchema` only creates schema (DDL) — it doesn't perform data transformations. Any existing data with old-format filters stays unconverted.
|
||||
|
||||
**This explains why Reporter 1's `bucket_configuration` has old-format strings despite migrations being "recorded".**
|
||||
|
||||
---
|
||||
|
||||
### Problem 3: Non-Transactional Migration Execution
|
||||
|
||||
**Location:** `xormigrate@v1.7.1/xormigrate.go:310-339`
|
||||
|
||||
All Vikunja migrations use the `Migrate` field (engine path):
|
||||
|
||||
```go
|
||||
func (x *Xormigrate) runMigration(migration *Migration) error {
|
||||
if !x.migrationDidRun(migration) {
|
||||
// Execute migration - NO TRANSACTION WRAPPING
|
||||
if migration.Migrate != nil {
|
||||
if err := migration.Migrate(x.db); err != nil {
|
||||
return fmt.Errorf("migration %s failed: %s", migration.ID, err.Error())
|
||||
}
|
||||
}
|
||||
|
||||
// Record migration ID AFTER execution
|
||||
if err := x.insertMigration(migration.ID); err != nil {
|
||||
return fmt.Errorf("inserting migration %s failed: %s", migration.ID, err.Error())
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
```
|
||||
|
||||
When `migration.Migrate(x.db)` is called, the migration receives a `*xorm.Engine`. Every operation (`tx.Exec()`, `tx.Find()`, `tx.Update()`) creates a fresh auto-commit session — **there is no transaction wrapping**.
|
||||
|
||||
**Failure scenarios:**
|
||||
|
||||
1. **Migration succeeds but insert fails:** The migration's effects are committed but not recorded. Next startup re-attempts the migration, which may fail with "duplicate column" errors.
|
||||
|
||||
2. **Migration partially succeeds then fails:** For multi-statement migrations, early statements are auto-committed. The migration is not recorded, and the next attempt sees a partially-modified database.
|
||||
|
||||
3. **Early migration failure halts entire chain:** If migration N fails, migrations N+1 through M are never attempted. The process calls `log.Fatalf` at `pkg/migration/migration.go:78`.
|
||||
|
||||
**This could explain Reporter 2's missing migrations after `20240919130957`.**
|
||||
|
||||
---
|
||||
|
||||
### Problem 4: Logic Bug in Migration 20241028131622
|
||||
|
||||
**Location:** `pkg/migration/20241028131622.go:57`
|
||||
|
||||
```go
|
||||
if err != nil && (!strings.Contains(err.Error(), "Error 1061") || !strings.Contains(err.Error(), "Duplicate key name")) {
|
||||
return err
|
||||
}
|
||||
```
|
||||
|
||||
This condition uses `||` (OR) when it should use `&&` (AND).
|
||||
|
||||
**Current behavior:** "Return error if it does NOT contain 'Error 1061' OR does NOT contain 'Duplicate key name'"
|
||||
|
||||
Since MySQL error messages typically contain one pattern but not both simultaneously, at least one `!strings.Contains` is almost always `true`, meaning this **returns the error instead of suppressing it**.
|
||||
|
||||
**Correct logic should be:**
|
||||
```go
|
||||
if err != nil && !strings.Contains(err.Error(), "Error 1061") && !strings.Contains(err.Error(), "Duplicate key name") {
|
||||
return err
|
||||
}
|
||||
```
|
||||
|
||||
On MySQL 8 (which doesn't support `IF NOT EXISTS` for `CREATE INDEX`), if indexes already exist, this migration would fail and halt the entire migration chain at `20241028131622`, leaving all subsequent migrations unrecorded.
|
||||
|
||||
**This could specifically explain Reporter 2's case on MySQL 8.**
|
||||
|
||||
---
|
||||
|
||||
## Evidence Summary
|
||||
|
||||
| Finding | Location | Impact |
|
||||
|---------|----------|--------|
|
||||
| No UNIQUE/PK on migration.id | xormigrate Migration struct | Allows duplicate rows |
|
||||
| No locking in migration process | xormigrate migrate() | Race condition on concurrent starts |
|
||||
| initSchema inserts IDs without running migrations | xormigrate runInitSchema() | Data transformations skipped |
|
||||
| Auto-commit per statement, no transaction | xormigrate runMigration() | Partial failures leave inconsistent state |
|
||||
| `\|\|` vs `&&` logic bug | 20241028131622.go:57 | False failures on MySQL 8 |
|
||||
|
||||
---
|
||||
|
||||
## Existing Mitigations
|
||||
|
||||
The team has already added "catch-up" migrations that re-check and fix data:
|
||||
|
||||
- `20250323212553.go` — Re-converts any `filter` values still in old string format
|
||||
- `20251001113831.go` — Re-converts any `bucket_configuration` filters still in string format
|
||||
|
||||
These migrations are idempotent and handle mixed-format data, which suggests awareness of this pattern.
|
||||
|
||||
---
|
||||
|
||||
## Recommended Fixes
|
||||
|
||||
### Fix 1: Add UNIQUE constraint to migration table
|
||||
|
||||
Either patch xormigrate or add a startup check:
|
||||
|
||||
```go
|
||||
// Option A: Modify xormigrate's Migration struct
|
||||
type Migration struct {
|
||||
ID string `xorm:"pk"` // or `xorm:"unique"`
|
||||
// ...
|
||||
}
|
||||
|
||||
// Option B: Add constraint via raw SQL at startup
|
||||
ALTER TABLE migration ADD CONSTRAINT migration_id_unique UNIQUE (id);
|
||||
```
|
||||
|
||||
### Fix 2: Add database-level advisory locking
|
||||
|
||||
Wrap the entire migration process in an advisory lock:
|
||||
|
||||
```go
|
||||
// MySQL
|
||||
SELECT GET_LOCK('vikunja_migrations', 30);
|
||||
// ... run migrations ...
|
||||
SELECT RELEASE_LOCK('vikunja_migrations');
|
||||
|
||||
// PostgreSQL
|
||||
SELECT pg_advisory_lock(12345);
|
||||
// ... run migrations ...
|
||||
SELECT pg_advisory_unlock(12345);
|
||||
```
|
||||
|
||||
### Fix 3: Fix the `||` vs `&&` logic bug
|
||||
|
||||
In `pkg/migration/20241028131622.go:57`:
|
||||
|
||||
```go
|
||||
// Before (buggy)
|
||||
if err != nil && (!strings.Contains(err.Error(), "Error 1061") || !strings.Contains(err.Error(), "Duplicate key name")) {
|
||||
|
||||
// After (correct)
|
||||
if err != nil && !strings.Contains(err.Error(), "Error 1061") && !strings.Contains(err.Error(), "Duplicate key name") {
|
||||
```
|
||||
|
||||
### Fix 4: Make data migrations idempotent
|
||||
|
||||
Every data-transformation migration should check current state before modifying:
|
||||
|
||||
```go
|
||||
// Example: Only convert if not already in new format
|
||||
if !strings.HasPrefix(view.Filter, "{") {
|
||||
// Convert to new format
|
||||
}
|
||||
```
|
||||
|
||||
### Fix 5: Consider a "repair" command
|
||||
|
||||
Add a CLI command that re-runs data migrations regardless of the tracking table, useful for recovering from inconsistent states.
|
||||
|
||||
---
|
||||
|
||||
## Questions for Further Investigation
|
||||
|
||||
1. Can we reproduce the race condition in a test environment?
|
||||
2. What is the exact MySQL version and error message for Reporter 2?
|
||||
3. Should we upstream fixes to xormigrate or fork/vendor it?
|
||||
4. Is there telemetry showing how common concurrent starts are?
|
||||
176
plans/task-index-increment.md
Normal file
176
plans/task-index-increment.md
Normal file
@@ -0,0 +1,176 @@
|
||||
# Design Doc: Project-Scoped Auto-Incrementing Task Index
|
||||
|
||||
**Status:** Draft
|
||||
**Date:** 2026-01-30
|
||||
|
||||
---
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Our application has a **tasks** table where each task belongs to a project. Tasks have two identifiers:
|
||||
|
||||
- **id** — A globally unique, auto-incrementing primary key
|
||||
- **index** — A human-friendly, project-scoped sequential number (e.g., PROJ-1, PROJ-2)
|
||||
|
||||
The **index** field must increment independently per project, starting at 1 for each new project.
|
||||
|
||||
Currently, the application code computes the next index by querying `MAX(index) + 1` before inserting a new task. This approach fails under concurrent inserts—two simultaneous requests can read the same MAX value and produce duplicate indexes, violating data integrity.
|
||||
|
||||
---
|
||||
|
||||
## Requirements
|
||||
|
||||
1. **Uniqueness:** Each (project_id, index) pair must be unique
|
||||
2. **Sequential:** Indexes should be sequential within each project (no gaps under normal operation)
|
||||
3. **Concurrent-safe:** Must handle multiple simultaneous inserts correctly
|
||||
4. **Performance:** Should scale with the number of projects and tasks
|
||||
|
||||
---
|
||||
|
||||
## Options
|
||||
|
||||
### Option 1: Counter Table with Row-Level Locking
|
||||
|
||||
Maintain a separate table that tracks the last assigned index for each project. A trigger atomically increments the counter and assigns the value to new tasks.
|
||||
|
||||
**Implementation:**
|
||||
|
||||
```sql
|
||||
-- Counter table
|
||||
CREATE TABLE project_task_counters (
|
||||
project_id INTEGER PRIMARY KEY REFERENCES projects(id),
|
||||
last_index INTEGER NOT NULL DEFAULT 0
|
||||
);
|
||||
|
||||
-- Trigger function
|
||||
CREATE OR REPLACE FUNCTION set_task_index()
|
||||
RETURNS TRIGGER AS $$
|
||||
BEGIN
|
||||
INSERT INTO project_task_counters (project_id, last_index)
|
||||
VALUES (NEW.project_id, 1)
|
||||
ON CONFLICT (project_id) DO UPDATE
|
||||
SET last_index = project_task_counters.last_index + 1
|
||||
RETURNING last_index INTO NEW.index;
|
||||
RETURN NEW;
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
|
||||
CREATE TRIGGER task_set_index
|
||||
BEFORE INSERT ON tasks
|
||||
FOR EACH ROW EXECUTE FUNCTION set_task_index();
|
||||
```
|
||||
|
||||
**How it works:**
|
||||
|
||||
- The UPSERT (INSERT ... ON CONFLICT DO UPDATE) acquires a row-level lock on the counter row
|
||||
- Concurrent transactions block until the lock is released
|
||||
- Each transaction gets the next sequential value guaranteed
|
||||
|
||||
---
|
||||
|
||||
### Option 2: Advisory Locks with MAX Query
|
||||
|
||||
Use PostgreSQL advisory locks to serialize inserts per project, then compute MAX(index) + 1 safely.
|
||||
|
||||
**Implementation:**
|
||||
|
||||
```sql
|
||||
CREATE OR REPLACE FUNCTION set_task_index()
|
||||
RETURNS TRIGGER AS $$
|
||||
BEGIN
|
||||
-- Lock based on project_id (released at transaction end)
|
||||
PERFORM pg_advisory_xact_lock('tasks'::regclass::integer, NEW.project_id);
|
||||
|
||||
SELECT COALESCE(MAX(index), 0) + 1 INTO NEW.index
|
||||
FROM tasks
|
||||
WHERE project_id = NEW.project_id;
|
||||
|
||||
RETURN NEW;
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
|
||||
CREATE TRIGGER task_set_index
|
||||
BEFORE INSERT ON tasks
|
||||
FOR EACH ROW EXECUTE FUNCTION set_task_index();
|
||||
```
|
||||
|
||||
**How it works:**
|
||||
|
||||
- pg_advisory_xact_lock acquires an exclusive lock scoped to the project
|
||||
- Lock is automatically released when the transaction commits or rolls back
|
||||
- No additional tables required
|
||||
|
||||
---
|
||||
|
||||
### Option 3: Optimistic Locking with Retry
|
||||
|
||||
Add a unique constraint on (project_id, index) and retry on conflict at the application level.
|
||||
|
||||
**Implementation:**
|
||||
|
||||
```sql
|
||||
-- Database constraint
|
||||
ALTER TABLE tasks ADD CONSTRAINT tasks_project_index_unique
|
||||
UNIQUE (project_id, index);
|
||||
```
|
||||
|
||||
```python
|
||||
# Application code (pseudocode)
|
||||
def create_task(project_id, title):
|
||||
for attempt in range(MAX_RETRIES):
|
||||
try:
|
||||
next_index = db.query(
|
||||
"SELECT COALESCE(MAX(index),0)+1 FROM tasks WHERE project_id=$1",
|
||||
project_id
|
||||
)
|
||||
db.execute(
|
||||
"INSERT INTO tasks (project_id, index, title) VALUES ($1, $2, $3)",
|
||||
project_id, next_index, title
|
||||
)
|
||||
return
|
||||
except UniqueViolation:
|
||||
continue
|
||||
raise Exception("Failed after retries")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Trade-off Comparison
|
||||
|
||||
| Criteria | Option 1: Counter Table | Option 2: Advisory Locks | Option 3: Optimistic Retry |
|
||||
|----------|------------------------|-------------------------|---------------------------|
|
||||
| **Performance** | O(1) — single row update | O(n) — scans tasks table | O(n) per attempt — may retry multiple times |
|
||||
| **Complexity** | Medium — extra table to manage | Low — no schema changes | Medium — retry logic in app |
|
||||
| **Scalability** | Excellent — constant time regardless of task count | Degrades with task count per project | Degrades under high contention |
|
||||
| **Contention Behavior** | Serializes at counter row — predictable | Serializes via advisory lock — predictable | Retries under collision — unpredictable latency |
|
||||
| **Gap-Free Guarantee** | Yes (if transactions commit) | Yes (if transactions commit) | Yes (if transactions commit) |
|
||||
| **External Dependencies** | None — pure PostgreSQL | None — pure PostgreSQL | Application retry logic |
|
||||
|
||||
---
|
||||
|
||||
## Recommendation
|
||||
|
||||
**Option 1 (Counter Table)** is recommended for most use cases because:
|
||||
|
||||
- **Constant-time performance** regardless of how many tasks exist per project
|
||||
- **Fully contained in PostgreSQL** — no application-level coordination required
|
||||
- **Predictable behavior** under concurrent load
|
||||
- **Simple operational model** — counter table can be inspected for debugging
|
||||
|
||||
Consider **Option 2 (Advisory Locks)** if you have a small number of tasks per project and want to avoid schema changes.
|
||||
|
||||
**Option 3 (Optimistic Retry)** is not recommended for this use case because it shifts complexity to the application layer and has unpredictable performance under contention.
|
||||
|
||||
---
|
||||
|
||||
## Migration Notes
|
||||
|
||||
If adopting Option 1 with existing data, initialize the counter table:
|
||||
|
||||
```sql
|
||||
INSERT INTO project_task_counters (project_id, last_index)
|
||||
SELECT project_id, MAX(index)
|
||||
FROM tasks
|
||||
GROUP BY project_id;
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user