71 Commits

Author SHA1 Message Date
narugo1992
6b4269404a fix(seed): share SEED_VERSION between seed + verify scripts
verify_seed_data.py hardcoded EXPECTED_SEED_VERSION = "local-dev-demo-v3"
but seed_demo_data.py was bumped to v4 in the preview PR, so the
post-seed verifier would falsely fail with a version mismatch. Extract
the constant to scripts/dev/seed_shared.py and import it from both
sides so the two scripts always agree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:02:33 +08:00
narugo1992
d0445282fb feat: pure-client safetensors/parquet metadata preview (#27)
Implements issue #27 v4: file-level HF-compatible metadata preview
computed entirely in the browser via HTTP Range reads against the
existing /resolve/ 302 → presigned S3/MinIO URL. Zero new backend
preview code, zero LRU, zero precomputation, zero new DB state.

Backend (minimal CORS plumbing only):
- main.py CORSMiddleware: add `expose_headers` so browsers can read
  Content-Range / X-Linked-* / X-Repo-Commit / ETag / Location off
  the final 206 response that follows the /resolve/ 302.
- docker-compose.example.yml + scripts/dev/up_infra.sh: wire
  `MINIO_API_CORS_ALLOW_ORIGIN` so the SPA can cross-origin Range-read
  presigned targets. Configurable via `DEV_MINIO_CORS_ALLOW_ORIGIN`.
- docs/development/local-dev.md: MinIO CORS section explaining the
  hard prerequisite + smoke-test probe + how to recreate the container.

Frontend:
- utils/safetensors.js (~190 LOC): pure-JS parser mirroring
  huggingface_hub.parse_safetensors_file_metadata byte-for-byte
  (speculative 100 KB first read, two-read fallback for fat headers,
  SAFETENSORS_MAX_HEADER_LENGTH guard). Exposes parseSafetensorsMetadata
  + summarizeSafetensors.
- utils/parquet.js: thin wrapper over hyparquet's asyncBufferFromUrl +
  parquetMetadataAsync with mode:"cors" + credentials:"omit" so cookies
  never leak onto presigned URLs. Normalizes BigInt row counts.
- components/repo/preview/FilePreviewDialog.vue: ElDialog with
  per-phase spinner text (range-head → parsing → done for safetensors,
  head → footer → parsing → done for parquet), dtype/row-group tables,
  and an explicit "CORS likely misconfigured" placeholder on failure.
- RepoViewer.vue: HF-style chart-line-data icon next to .safetensors
  and .parquet rows; click opens the modal with the resolved /resolve/
  URL for the current branch.

Tests + fixtures:
- test_files.py::test_resolve_get_302_exposes_cors_headers_for_browser_preview
  pins the `Access-Control-Expose-Headers` list against regressions.
- test/kohaku-hub-ui/utils/test_safetensors.test.js: 6 cases covering
  the real-HF-format fixture, dtype summary, progress phases, fat-header
  fallback, oversized-header guard, and non-206 error paths.
- test/kohaku-hub-ui/utils/test_parquet.test.js: footer parse +
  progress phase assertions.
- test/kohaku-hub-ui/fixtures/previews/{tiny.safetensors,tiny.parquet}:
  byte-identical-to-HF fixtures produced by the real safetensors /
  pyarrow libs via scripts/dev/generate_preview_test_fixtures.py
  (committed so tests stay offline per AGENTS.md §5.2).

Seed:
- seed_demo_data.py: add two RemoteAsset entries for real HF-hosted
  small fixtures pinned by sha256, and wire them into visible paths
  (open-media-lab/vision-language-assistant-3b/fixtures/hf-tiny-random-bert.safetensors,
  open-media-lab/multimodal-benchmark-suite/fixtures/hf-no-robots-test.parquet)
  so the preview can be exercised against files that actually came off
  huggingface.co rather than purely local pyarrow/safetensors output.
  SEED_VERSION bumped to local-dev-demo-v4.

Verified end-to-end against the dev stack: safetensors parser output
on the seeded fixtures matches huggingface_hub.parse_safetensors_file_metadata
byte-for-byte on the same file (100 tensors, 126,851 params, I64=512
/ F32=126,339, metadata `{format: pt, ...}`). Browser preview modal
renders both file kinds correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 13:41:26 +08:00
narugo1992
25f5412779 Seed HuggingFace as a global fallback source in make seed-demo
The local demo seed now installs https://huggingface.co as a low-priority
(priority=1000) global fallback source via the admin API, so a fresh
`make seed-demo` can resolve public HF repos out-of-the-box. Bumps the
seed version to local-dev-demo-v3 and updates verify_seed_data.py to
assert the seeded source is advertised via /api/fallback-sources/available.

The creation step is idempotent: it lists global sources first and skips
the insert when a matching URL already exists.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 12:02:44 +08:00
narugo1992
8088fdb98a Fix repo tree path handling 2026-04-21 15:09:58 +08:00
narugo1992
03e95a6290 Expand local demo seed assets 2026-04-21 13:51:46 +08:00
narugo1992
f76a8d49f2 Refine local dev reset workflow 2026-04-21 12:50:30 +08:00
narugo1992
4c6dba6458 Add local development bootstrap and deterministic demo fixtures 2026-04-19 14:14:15 +08:00
Patryk Zdunowski
5071de22f3 chore(json-schema-generator): generate_json_schema.py script helpful for generating schemas for pydantic models 2026-01-30 18:21:54 +01:00
Kohaku-Blueleaf
3eb363f45a update config migrate script 2026-01-22 23:29:47 +08:00
Kohaku-Blueleaf
ef72901149 fix example docker config/generator 2026-01-22 23:29:07 +08:00
Kohaku-Blueleaf
3c8f7ac1e2 seperate KohakuBoard to standalone repo 2025-10-29 16:57:23 +08:00
KohakuBlueleaf
a8598ef893 add kobo formating script 2025-10-29 14:57:55 +08:00
Kohaku-Blueleaf
817344654b clean up redundant test script/temp doc 2025-10-28 21:15:46 +08:00
Kohaku-Blueleaf
9c15415487 update mock data gen 2025-10-27 17:27:06 +08:00
Kohaku-Blueleaf
37c431b3ea KohakuBoard related doc early version 2025-10-27 12:27:48 +08:00
Kohaku-Blueleaf
6f84e09f66 better deploy choice 2025-10-27 12:27:20 +08:00
Kohaku-Blueleaf
69b6560021 kohakuboard deploy 2025-10-27 12:14:25 +08:00
Kohaku-Blueleaf
a722054ecc Merge pull request #1 from ntrwansuiBC/main
Logger improvements with loguru
2025-10-25 22:41:25 +08:00
Kohaku-Blueleaf
3d8f5b6de6 build script for kohakuboard 2025-10-25 18:46:56 +08:00
lxy
980c1462c1 update: now logger.py based on loguru.
feature: now logs will output to file if set app.log_format to file. default path: logs/kohakuhub.log
2025-10-24 23:03:16 +08:00
Kohaku-Blueleaf
2a02025a22 improve backend implementation 2025-10-24 05:32:46 +08:00
Kohaku-Blueleaf
a74fa71280 add test data gen 2025-10-24 03:13:44 +08:00
Kohaku-Blueleaf
eb3f8420ea linting/formating 2025-10-23 17:19:31 +08:00
Kohaku-Blueleaf
de77ddf2df update deploy things 2025-10-23 14:47:41 +08:00
Kohaku-Blueleaf
dd0f5a8021 better confirmation system 2025-10-23 02:35:27 +08:00
Kohaku-Blueleaf
edaee890db update doc and Docker related utils 2025-10-22 23:25:41 +08:00
Kohaku-Blueleaf
e33eee9f17 fix scripts bugs 2025-10-22 21:53:08 +08:00
Kohaku-Blueleaf
4885ebdd51 fix scripts bugs 2025-10-22 21:23:06 +08:00
Kohaku-Blueleaf
de7cc89e47 fix scripts bugs 2025-10-22 21:20:54 +08:00
Kohaku-Blueleaf
80315c53e9 update config and add migrate script 2025-10-22 21:09:28 +08:00
Kohaku-Blueleaf
140cb937ae Allow user provide external token 2025-10-22 20:52:16 +08:00
Kohaku-Blueleaf
105d7ff6c2 use correct size unit in every script 2025-10-22 02:43:31 +08:00
Kohaku-Blueleaf
40f4714c03 update config system for better fallback mech 2025-10-21 23:24:35 +08:00
Kohaku-Blueleaf
a4743c6133 add lfs multipart setting into config 2025-10-21 21:56:52 +08:00
Kohaku-Blueleaf
bede383535 fix errors 2025-10-20 22:58:25 +08:00
Kohaku-Blueleaf
69a271cfaf Add config gen and better formatting 2025-10-20 12:04:49 +08:00
Kohaku-Blueleaf
6b1e0024d1 correct shell handling 2025-10-19 20:45:50 +08:00
Kohaku-Blueleaf
4c04c69aa7 update formatting/deploy workflow 2025-10-19 20:36:57 +08:00
Kohaku-Blueleaf
53f739b216 add deploy script 2025-10-19 19:28:41 +08:00
Kohaku-Blueleaf
8f300adec6 use us-east-1 as default region to make minio happy 2025-10-19 00:21:28 +08:00
Kohaku-Blueleaf
392c4c8a68 modify config.py to make minio works 2025-10-19 00:11:33 +08:00
Kohaku-Blueleaf
3c080bc632 use None for s3v2 2025-10-18 23:57:01 +08:00
Kohaku-Blueleaf
f8e647be40 update region/signature setting 2025-10-18 23:51:19 +08:00
Kohaku-Blueleaf
b5b5c56711 Fix lakefs and s3 problem 2025-10-18 22:14:24 +08:00
Kohaku-Blueleaf
7ccf283444 Allow system/admin generated invitation 2025-10-18 11:30:36 +08:00
Kohaku-Blueleaf
c14f542f52 No migration when db not init 2025-10-18 11:25:33 +08:00
Kohaku-Blueleaf
1224aa5b64 Fix storage tracking of deleted file, Fix LFS dedup for global wise dedup 2025-10-18 03:42:00 +08:00
Kohaku-Blueleaf
266e1074f5 update migration 2025-10-17 23:05:08 +08:00
Kohaku-Blueleaf
77c6cb7c75 Update migration script impl 2025-10-17 20:43:54 +08:00
Kohaku-Blueleaf
99670e672c Fix import/formatting 2025-10-16 18:57:35 +08:00