7 Commits

Author SHA1 Message Date
narugo1992
6b4269404a fix(seed): share SEED_VERSION between seed + verify scripts
verify_seed_data.py hardcoded EXPECTED_SEED_VERSION = "local-dev-demo-v3"
but seed_demo_data.py was bumped to v4 in the preview PR, so the
post-seed verifier would falsely fail with a version mismatch. Extract
the constant to scripts/dev/seed_shared.py and import it from both
sides so the two scripts always agree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:02:33 +08:00
narugo1992
d0445282fb feat: pure-client safetensors/parquet metadata preview (#27)
Implements issue #27 v4: file-level HF-compatible metadata preview
computed entirely in the browser via HTTP Range reads against the
existing /resolve/ 302 → presigned S3/MinIO URL. Zero new backend
preview code, zero LRU, zero precomputation, zero new DB state.

Backend (minimal CORS plumbing only):
- main.py CORSMiddleware: add `expose_headers` so browsers can read
  Content-Range / X-Linked-* / X-Repo-Commit / ETag / Location off
  the final 206 response that follows the /resolve/ 302.
- docker-compose.example.yml + scripts/dev/up_infra.sh: wire
  `MINIO_API_CORS_ALLOW_ORIGIN` so the SPA can cross-origin Range-read
  presigned targets. Configurable via `DEV_MINIO_CORS_ALLOW_ORIGIN`.
- docs/development/local-dev.md: MinIO CORS section explaining the
  hard prerequisite + smoke-test probe + how to recreate the container.

Frontend:
- utils/safetensors.js (~190 LOC): pure-JS parser mirroring
  huggingface_hub.parse_safetensors_file_metadata byte-for-byte
  (speculative 100 KB first read, two-read fallback for fat headers,
  SAFETENSORS_MAX_HEADER_LENGTH guard). Exposes parseSafetensorsMetadata
  + summarizeSafetensors.
- utils/parquet.js: thin wrapper over hyparquet's asyncBufferFromUrl +
  parquetMetadataAsync with mode:"cors" + credentials:"omit" so cookies
  never leak onto presigned URLs. Normalizes BigInt row counts.
- components/repo/preview/FilePreviewDialog.vue: ElDialog with
  per-phase spinner text (range-head → parsing → done for safetensors,
  head → footer → parsing → done for parquet), dtype/row-group tables,
  and an explicit "CORS likely misconfigured" placeholder on failure.
- RepoViewer.vue: HF-style chart-line-data icon next to .safetensors
  and .parquet rows; click opens the modal with the resolved /resolve/
  URL for the current branch.

Tests + fixtures:
- test_files.py::test_resolve_get_302_exposes_cors_headers_for_browser_preview
  pins the `Access-Control-Expose-Headers` list against regressions.
- test/kohaku-hub-ui/utils/test_safetensors.test.js: 6 cases covering
  the real-HF-format fixture, dtype summary, progress phases, fat-header
  fallback, oversized-header guard, and non-206 error paths.
- test/kohaku-hub-ui/utils/test_parquet.test.js: footer parse +
  progress phase assertions.
- test/kohaku-hub-ui/fixtures/previews/{tiny.safetensors,tiny.parquet}:
  byte-identical-to-HF fixtures produced by the real safetensors /
  pyarrow libs via scripts/dev/generate_preview_test_fixtures.py
  (committed so tests stay offline per AGENTS.md §5.2).

Seed:
- seed_demo_data.py: add two RemoteAsset entries for real HF-hosted
  small fixtures pinned by sha256, and wire them into visible paths
  (open-media-lab/vision-language-assistant-3b/fixtures/hf-tiny-random-bert.safetensors,
  open-media-lab/multimodal-benchmark-suite/fixtures/hf-no-robots-test.parquet)
  so the preview can be exercised against files that actually came off
  huggingface.co rather than purely local pyarrow/safetensors output.
  SEED_VERSION bumped to local-dev-demo-v4.

Verified end-to-end against the dev stack: safetensors parser output
on the seeded fixtures matches huggingface_hub.parse_safetensors_file_metadata
byte-for-byte on the same file (100 tensors, 126,851 params, I64=512
/ F32=126,339, metadata `{format: pt, ...}`). Browser preview modal
renders both file kinds correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 13:41:26 +08:00
narugo1992
25f5412779 Seed HuggingFace as a global fallback source in make seed-demo
The local demo seed now installs https://huggingface.co as a low-priority
(priority=1000) global fallback source via the admin API, so a fresh
`make seed-demo` can resolve public HF repos out-of-the-box. Bumps the
seed version to local-dev-demo-v3 and updates verify_seed_data.py to
assert the seeded source is advertised via /api/fallback-sources/available.

The creation step is idempotent: it lists global sources first and skips
the insert when a matching URL already exists.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 12:02:44 +08:00
narugo1992
8088fdb98a Fix repo tree path handling 2026-04-21 15:09:58 +08:00
narugo1992
03e95a6290 Expand local demo seed assets 2026-04-21 13:51:46 +08:00
narugo1992
f76a8d49f2 Refine local dev reset workflow 2026-04-21 12:50:30 +08:00
narugo1992
4c6dba6458 Add local development bootstrap and deterministic demo fixtures 2026-04-19 14:14:15 +08:00