verify_seed_data.py hardcoded EXPECTED_SEED_VERSION = "local-dev-demo-v3"
but seed_demo_data.py was bumped to v4 in the preview PR, so the
post-seed verifier would falsely fail with a version mismatch. Extract
the constant to scripts/dev/seed_shared.py and import it from both
sides so the two scripts always agree.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements issue #27 v4: file-level HF-compatible metadata preview
computed entirely in the browser via HTTP Range reads against the
existing /resolve/ 302 → presigned S3/MinIO URL. Zero new backend
preview code, zero LRU, zero precomputation, zero new DB state.
Backend (minimal CORS plumbing only):
- main.py CORSMiddleware: add `expose_headers` so browsers can read
Content-Range / X-Linked-* / X-Repo-Commit / ETag / Location off
the final 206 response that follows the /resolve/ 302.
- docker-compose.example.yml + scripts/dev/up_infra.sh: wire
`MINIO_API_CORS_ALLOW_ORIGIN` so the SPA can cross-origin Range-read
presigned targets. Configurable via `DEV_MINIO_CORS_ALLOW_ORIGIN`.
- docs/development/local-dev.md: MinIO CORS section explaining the
hard prerequisite + smoke-test probe + how to recreate the container.
Frontend:
- utils/safetensors.js (~190 LOC): pure-JS parser mirroring
huggingface_hub.parse_safetensors_file_metadata byte-for-byte
(speculative 100 KB first read, two-read fallback for fat headers,
SAFETENSORS_MAX_HEADER_LENGTH guard). Exposes parseSafetensorsMetadata
+ summarizeSafetensors.
- utils/parquet.js: thin wrapper over hyparquet's asyncBufferFromUrl +
parquetMetadataAsync with mode:"cors" + credentials:"omit" so cookies
never leak onto presigned URLs. Normalizes BigInt row counts.
- components/repo/preview/FilePreviewDialog.vue: ElDialog with
per-phase spinner text (range-head → parsing → done for safetensors,
head → footer → parsing → done for parquet), dtype/row-group tables,
and an explicit "CORS likely misconfigured" placeholder on failure.
- RepoViewer.vue: HF-style chart-line-data icon next to .safetensors
and .parquet rows; click opens the modal with the resolved /resolve/
URL for the current branch.
Tests + fixtures:
- test_files.py::test_resolve_get_302_exposes_cors_headers_for_browser_preview
pins the `Access-Control-Expose-Headers` list against regressions.
- test/kohaku-hub-ui/utils/test_safetensors.test.js: 6 cases covering
the real-HF-format fixture, dtype summary, progress phases, fat-header
fallback, oversized-header guard, and non-206 error paths.
- test/kohaku-hub-ui/utils/test_parquet.test.js: footer parse +
progress phase assertions.
- test/kohaku-hub-ui/fixtures/previews/{tiny.safetensors,tiny.parquet}:
byte-identical-to-HF fixtures produced by the real safetensors /
pyarrow libs via scripts/dev/generate_preview_test_fixtures.py
(committed so tests stay offline per AGENTS.md §5.2).
Seed:
- seed_demo_data.py: add two RemoteAsset entries for real HF-hosted
small fixtures pinned by sha256, and wire them into visible paths
(open-media-lab/vision-language-assistant-3b/fixtures/hf-tiny-random-bert.safetensors,
open-media-lab/multimodal-benchmark-suite/fixtures/hf-no-robots-test.parquet)
so the preview can be exercised against files that actually came off
huggingface.co rather than purely local pyarrow/safetensors output.
SEED_VERSION bumped to local-dev-demo-v4.
Verified end-to-end against the dev stack: safetensors parser output
on the seeded fixtures matches huggingface_hub.parse_safetensors_file_metadata
byte-for-byte on the same file (100 tensors, 126,851 params, I64=512
/ F32=126,339, metadata `{format: pt, ...}`). Browser preview modal
renders both file kinds correctly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The local demo seed now installs https://huggingface.co as a low-priority
(priority=1000) global fallback source via the admin API, so a fresh
`make seed-demo` can resolve public HF repos out-of-the-box. Bumps the
seed version to local-dev-demo-v3 and updates verify_seed_data.py to
assert the seeded source is advertised via /api/fallback-sources/available.
The creation step is idempotent: it lists global sources first and skips
the insert when a matching URL already exists.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>