mirror of
https://github.com/KohakuBlueleaf/KohakuHub.git
synced 2026-05-06 04:17:46 -05:00
Implements issue #27 v4: file-level HF-compatible metadata preview computed entirely in the browser via HTTP Range reads against the existing /resolve/ 302 → presigned S3/MinIO URL. Zero new backend preview code, zero LRU, zero precomputation, zero new DB state. Backend (minimal CORS plumbing only): - main.py CORSMiddleware: add `expose_headers` so browsers can read Content-Range / X-Linked-* / X-Repo-Commit / ETag / Location off the final 206 response that follows the /resolve/ 302. - docker-compose.example.yml + scripts/dev/up_infra.sh: wire `MINIO_API_CORS_ALLOW_ORIGIN` so the SPA can cross-origin Range-read presigned targets. Configurable via `DEV_MINIO_CORS_ALLOW_ORIGIN`. - docs/development/local-dev.md: MinIO CORS section explaining the hard prerequisite + smoke-test probe + how to recreate the container. Frontend: - utils/safetensors.js (~190 LOC): pure-JS parser mirroring huggingface_hub.parse_safetensors_file_metadata byte-for-byte (speculative 100 KB first read, two-read fallback for fat headers, SAFETENSORS_MAX_HEADER_LENGTH guard). Exposes parseSafetensorsMetadata + summarizeSafetensors. - utils/parquet.js: thin wrapper over hyparquet's asyncBufferFromUrl + parquetMetadataAsync with mode:"cors" + credentials:"omit" so cookies never leak onto presigned URLs. Normalizes BigInt row counts. - components/repo/preview/FilePreviewDialog.vue: ElDialog with per-phase spinner text (range-head → parsing → done for safetensors, head → footer → parsing → done for parquet), dtype/row-group tables, and an explicit "CORS likely misconfigured" placeholder on failure. - RepoViewer.vue: HF-style chart-line-data icon next to .safetensors and .parquet rows; click opens the modal with the resolved /resolve/ URL for the current branch. Tests + fixtures: - test_files.py::test_resolve_get_302_exposes_cors_headers_for_browser_preview pins the `Access-Control-Expose-Headers` list against regressions. - test/kohaku-hub-ui/utils/test_safetensors.test.js: 6 cases covering the real-HF-format fixture, dtype summary, progress phases, fat-header fallback, oversized-header guard, and non-206 error paths. - test/kohaku-hub-ui/utils/test_parquet.test.js: footer parse + progress phase assertions. - test/kohaku-hub-ui/fixtures/previews/{tiny.safetensors,tiny.parquet}: byte-identical-to-HF fixtures produced by the real safetensors / pyarrow libs via scripts/dev/generate_preview_test_fixtures.py (committed so tests stay offline per AGENTS.md §5.2). Seed: - seed_demo_data.py: add two RemoteAsset entries for real HF-hosted small fixtures pinned by sha256, and wire them into visible paths (open-media-lab/vision-language-assistant-3b/fixtures/hf-tiny-random-bert.safetensors, open-media-lab/multimodal-benchmark-suite/fixtures/hf-no-robots-test.parquet) so the preview can be exercised against files that actually came off huggingface.co rather than purely local pyarrow/safetensors output. SEED_VERSION bumped to local-dev-demo-v4. Verified end-to-end against the dev stack: safetensors parser output on the seeded fixtures matches huggingface_hub.parse_safetensors_file_metadata byte-for-byte on the same file (100 tensors, 126,851 params, I64=512 / F32=126,339, metadata `{format: pt, ...}`). Browser preview modal renders both file kinds correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>