mock data api for kohakuboard

This commit is contained in:
Kohaku-Blueleaf
2025-10-25 18:46:06 +08:00
parent b45be04146
commit 09a0cd0c54
13 changed files with 1328 additions and 0 deletions

205
src/kohakuboard/LICENSE Normal file
View File

@@ -0,0 +1,205 @@
# Kohaku Software License 1.0
**Published by KohakuBlueLeaf**
## Purpose
The **Kohaku Software License** aims to provide maximum freedom for users to work with the Software while protecting contributors from liability and ensuring the freedom of end users. It incorporates commercial usage restrictions to balance open access with sustainable development.
## Definitions
- **Software**: Refers to the source code, compiled binaries, libraries, modules, documentation, configuration files, and any other materials provided under this License.
- **Source Code**: The preferred form for making modifications to the Software, including all source files, build scripts, configuration files, and documentation necessary to understand, compile, and modify the Software.
- **Derivative Work**: Any software based on or derived from the original Software, including but not limited to:
- Modified versions of the Software
- Software that incorporates any portion of the Software
- Software that links to, imports, or otherwise depends on the Software in a manner that creates a combined work
For a Derivative Work to qualify under this license, it must include the complete Source Code necessary to build, use, and modify the Derivative Work.
- **Modify**: To alter, adapt, translate, or otherwise change the Software, or to create Derivative Works.
- **Service Provider**: An entity that uses the Software to offer services to **End Users**, thereby making the **End Users** the recipients of the service.
- **End User**: Any individual or entity that uses the Software directly or uses services provided by a **Service Provider** that utilizes the Software.
- **Non-Commercial Purpose**: Uses that do not involve direct or indirect monetary compensation arising from the use of the Software, including personal use, academic research, experimentation, testing, or non-commercial organizational use.
- **Commercial Usage**: Any use of the Software where:
- The Software is used to provide services or products to customers, clients, or users (internal or external) for monetary compensation, or
- The Software is incorporated into commercial products or services, or
- The Software is used as part of internal company systems that help internal teams execute their business operations in a for-profit organization, or
- The organization using the Software generates revenue from activities directly or indirectly involving the Software
- **Total Revenue**:
- For Service Providers: The total revenue generated from services utilizing the Software
- For product vendors: The total revenue from products incorporating the Software
- For internal business systems: The total revenue of the organization using the Software for business operations
## License Grant
### 1. General Permissions
Subject to compliance with this License, KohakuBlueLeaf grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty-free, and limited license to access, use, modify, create Derivative Works, and distribute the Software for **Non-Commercial Purposes** and **Commercial Usage** under certain conditions.
### 2. Categories of Use
#### a. Direct Users
Individuals or entities that use the Software directly for their personal, academic, or non-commercial purposes without operating in a commercial capacity.
#### b. Service Providers and Commercial Entities
Entities that use the Software to offer services or products to **End Users**, or that use the Software for internal business operations in a for-profit organization.
### 3. Source Code Availability
When using or distributing the Software or any Derivative Works, you must:
- Make the complete Source Code available to recipients
- Ensure the Source Code is in a form that allows recipients to build, modify, and use the Software
- Include all necessary build scripts, configuration files, and dependencies information
### 4. Derivative Works
Any Derivative Works created must be published under the **Kohaku Software License**. The minimal requirement includes:
- Complete Source Code of the Derivative Work
- Build and installation instructions
- Clear indication of what has been modified from the original Software
**Additional Requirements for Combined Works:**
- If the Derivative Work combines multiple software components or libraries, all such components that form a combined work must be published under this License or a compatible license.
- You must provide clear documentation on how the components interact and how to build the combined work.
- **Note**: You are not obligated to release proprietary business logic or workflows that use the Software through standard APIs or interfaces without creating Derivative Works.
## Restrictions
### 1. Commercial Usage
- **Definition**: **Commercial Usage** is defined as any use where:
- The Software is used to provide services or products to customers, clients, or users (internal or external) for monetary compensation
- The Software is incorporated into commercial products or services
- The Software is used as part of internal company systems that help internal teams execute their business operations in a for-profit organization
- The organization using the Software generates revenue from activities directly or indirectly involving the Software
- **Conditions for Requiring a Commercial License**: Commercial Usage is prohibited **if either** of the following conditions are met:
- **Total Revenue** attributable to or associated with the Software exceeds $25,000 USD per year, OR
- **Usage Duration** exceeds 3 months
- **Revenue Threshold and Usage Duration**:
- **Trial Period**: Entities are allowed to engage in **Commercial Usage** without a commercial license for a trial period of **up to 3 months**, provided their **Total Revenue** remains below or equal to $25,000 USD per year.
- **Revenue Limit**: Entities with **Total Revenue** attributable to or associated with the Software below or equal to $25,000 USD per year are permitted to continue **Commercial Usage** without a commercial license, provided the **Usage Duration** does not exceed 3 months.
- **Exceeding Either Threshold**: If an entity's **Total Revenue** exceeds $25,000 USD per year OR the **Commercial Usage** period exceeds 3 months, the entity must request a commercial license from the author.
- **Requesting a Commercial License**: Entities that need to engage in **Commercial Usage** exceeding both thresholds must contact the author at kohaku@kblueleaf.net to request a commercial license. The author may grant such licenses at their sole discretion, potentially subject to fees, royalties, or revenue-sharing agreements.
### 2. Prohibited Uses
You may not use the Software for:
- Military purposes or weapons development
- Surveillance systems or mass monitoring
- Biometric identification or tracking systems
- Any activity that infringes on third-party rights
- Any use violating applicable laws, including privacy and security regulations
- Generating or distributing malware, exploits, or other malicious software
You may not:
- Alter or remove copyright and proprietary notices
- Circumvent or remove any security or usage restrictions
- Impose additional terms that conflict with this License
- Distribute the Software to prohibited individuals, entities, or countries as defined by applicable export laws
### 3. Distribution Requirements
When distributing the Software or any Derivative Works, you must:
- Include a copy of this License with the distribution
- Include the complete Source Code or provide clear instructions on how to obtain it
- **Attribution Notice**: Prominently display the following notice:
```
This Software is licensed under the Kohaku Software License by KohakuBlueLeaf.
Copyright 2025 KohakuBlueLeaf.
IN NO EVENT SHALL KohakuBlueLeaf BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER
LIABILITY ARISING FROM THE USE OF THIS SOFTWARE.
```
- **For Derivative Works**:
- Include a statement clearly indicating that you have modified the original Software
- Document the nature of modifications made
- Ensure all Source Code is available under this License
- **No Misrepresentation**: Do not misrepresent or imply that Derivative Works are official versions or have been endorsed by the original author unless authorized in writing.
- **Service Provider Requirements**:
- **Service Providers** must provide **End Users** with clear notice that the service utilizes Software licensed under the Kohaku Software License
- Include a reference to the original Software and this License in service documentation, terms of service, or user interface (e.g., "About" page, footer)
## No Harm and No Liability
### 1. No Harm
You agree that no contributor's conduct in creating the Software has caused you harm. To the extent permitted by law, you waive the right to pursue any legal claims against contributors related to the creation of the Software.
### 2. No Liability
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT, OR OTHERWISE, ARISING FROM, OUT OF, OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
## Patent Grant
Each contributor grants you a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable patent license to make, use, offer to sell, sell, import, and otherwise transfer the Software, where such license applies only to those patent claims licensable by such contributor that are necessarily infringed by their contribution(s) alone or by combination of their contribution(s) with the Software.
## Interpretation of Ambiguous Terms
In the event of any ambiguity or uncertainty in the interpretation of the terms of this License, the Licensee has the right to interpret the ambiguous descriptions in a manner that aligns with the intended purpose of this License, which is to promote open access while protecting sustainable development through commercial licensing.
## Acceptance and Compliance
By using, modifying, or distributing the Software, you agree to comply with all terms of this License. Non-compliance may result in the automatic termination of your rights under this License.
## Termination
Your rights under this License terminate automatically upon any breach of its terms. Upon termination, you must:
- Cease all use, modification, and distribution of the Software and Derivative Works
- Destroy all copies of the Software in your possession or control
- If you are a Service Provider, cease providing services that utilize the Software
Sections regarding No Liability, Indemnification, and General Provisions survive termination.
## Indemnification
You agree to indemnify, defend, and hold harmless KohakuBlueLeaf and its affiliates, contributors, and licensors from and against any claims, damages, losses, liabilities, costs, and expenses (including reasonable attorneys' fees) arising from:
- Your use of the Software
- Your violation of this License
- Your violation of any rights of another party
- Your distribution of the Software or Derivative Works
## General Provisions
- **Governing Law**: This License is governed by the laws of Taiwan, without regard to conflict of law principles.
- **Severability**: If any provision of this License is held to be unenforceable or invalid, that provision shall be modified to the minimum extent necessary to make it enforceable, and the remaining provisions shall remain in full force and effect.
- **Entire Agreement**: This License constitutes the entire agreement between you and KohakuBlueLeaf regarding the Software and supersedes all prior agreements and understandings.
- **No Waiver**: The failure of KohakuBlueLeaf to enforce any provision of this License shall not constitute a waiver of that provision or any other provision.
- **Assignment**: You may not assign or transfer your rights or obligations under this License without prior written consent from KohakuBlueLeaf.
## Revisions
KohakuBlueLeaf may publish revised versions of the Kohaku Software License from time to time. Each version will be given a distinguishing version number. You may choose to use the Software under the terms of the version of the License under which you originally received the Software, or under the terms of any subsequent version published by KohakuBlueLeaf.
## Contact
For commercial licensing inquiries, please contact: kohaku@kblueleaf.net

46
src/kohakuboard/README.md Normal file
View File

@@ -0,0 +1,46 @@
# KohakuBoard - Backend
Minimal experiment tracking backend with high-performance data serving.
## Features
- FastAPI-based REST API
- Sparse metric logging support
- Multiple data types: scalars, media, tables, histograms
- Step-indexed data structure
- Mock data generation for testing
## License
**Kohaku Software License 1.0**
This is a premium feature of KohakuHub with commercial usage restrictions.
- ✅ Free for non-commercial use
- ⚠️ Commercial trial: 3 months OR $25k revenue/year
- ⚠️ After trial, requires commercial license
Contact: kohaku@kblueleaf.net
## Installation
```bash
pip install -e .
```
## Development
```bash
uvicorn kohakuboard.main:app --reload --port 48889
```
API docs: http://localhost:48889/docs
## API Endpoints
- `GET /api/experiments` - List experiments
- `GET /api/experiments/{id}/summary` - Get experiment summary
- `GET /api/experiments/{id}/scalars/{name}` - Get scalar metric data
- `GET /api/experiments/{id}/media/{name}` - Get media log
- `GET /api/experiments/{id}/tables/{name}` - Get table log
- `GET /api/experiments/{id}/histograms/{name}` - Get histogram log

View File

@@ -0,0 +1,3 @@
"""KohakuBoard - ML Experiment Tracking System"""
__version__ = "0.1.0"

View File

@@ -0,0 +1 @@
"""API module for KohakuBoard"""

View File

@@ -0,0 +1 @@
"""API routers for KohakuBoard"""

View File

@@ -0,0 +1,321 @@
"""Experiments API endpoints"""
import random
from fastapi import APIRouter, HTTPException, Query
from pydantic import BaseModel
from typing import List, Optional
from kohakuboard.api.utils.mock_data import (
generate_experiment,
generate_metrics_data,
generate_sparse_metrics_data,
generate_histogram_data,
generate_table_data,
)
from kohakuboard.config import cfg
from kohakuboard.logger import logger_api
router = APIRouter()
# Mock experiment storage with large datasets for testing WebGL performance
MOCK_EXPERIMENTS = {
"exp-001": generate_experiment(
"exp-001", "ResNet50 Training (1K steps)", steps=1000, status="completed"
),
"exp-002": generate_experiment(
"exp-002", "BERT Fine-tuning (10K steps)", steps=10000, status="running"
),
"exp-003": generate_experiment(
"exp-003", "ViT Pretraining (50K steps)", steps=50000, status="completed"
),
"exp-004": generate_experiment(
"exp-004", "GPT-2 Training (100K steps)", steps=100000, status="completed"
),
"exp-005": generate_experiment(
"exp-005", "Stable Diffusion (25K steps)", steps=25000, status="stopped"
),
}
class MetricsQuery(BaseModel):
"""Query parameters for metrics"""
metric_names: Optional[List[str]] = None
start_step: Optional[int] = None
end_step: Optional[int] = None
@router.get("/experiments")
async def list_experiments():
"""List all experiments"""
logger_api.info("Fetching experiments list")
return list(MOCK_EXPERIMENTS.values())
@router.get("/experiments/{experiment_id}")
async def get_experiment(experiment_id: str):
"""Get experiment details"""
logger_api.info(f"Fetching experiment: {experiment_id}")
if experiment_id not in MOCK_EXPERIMENTS:
raise HTTPException(status_code=404, detail="Experiment not found")
return MOCK_EXPERIMENTS[experiment_id]
@router.get("/experiments/{experiment_id}/metrics")
async def get_metrics(
experiment_id: str,
metric_names: Optional[str] = Query(
None, description="Comma-separated metric names"
),
start_step: Optional[int] = Query(None, description="Start step"),
end_step: Optional[int] = Query(None, description="End step"),
steps: int = Query(None, description="Number of steps", le=cfg.mock.max_steps),
):
"""Get metrics data for an experiment"""
logger_api.info(f"Fetching metrics for experiment: {experiment_id}")
if experiment_id not in MOCK_EXPERIMENTS:
raise HTTPException(status_code=404, detail="Experiment not found")
if steps is None:
steps = MOCK_EXPERIMENTS[experiment_id]["total_steps"]
# Parse metric names
metrics = None
if metric_names:
metrics = [m.strip() for m in metric_names.split(",")]
# Generate metrics data
metrics_data = generate_metrics_data(steps=steps, metrics=metrics)
# Filter by step range if provided
if start_step is not None or end_step is not None:
for metric in metrics_data:
start = start_step or 0
end = end_step or len(metric["x"])
metric["x"] = metric["x"][start:end]
metric["y"] = metric["y"][start:end]
return {"experiment_id": experiment_id, "metrics": metrics_data}
@router.get("/experiments/{experiment_id}/summary")
async def get_experiment_summary(experiment_id: str):
"""Get experiment summary with available data"""
logger_api.info(f"Fetching summary for experiment: {experiment_id}")
if experiment_id not in MOCK_EXPERIMENTS:
raise HTTPException(status_code=404, detail="Experiment not found")
sample = generate_sparse_metrics_data(total_events=100)
return {
"experiment_id": experiment_id,
"experiment_info": MOCK_EXPERIMENTS[experiment_id],
"total_steps": MOCK_EXPERIMENTS[experiment_id]["total_steps"],
"available_data": {
"scalars": [k for k in sample.keys() if k != "time"],
"media": ["generated_images", "model_predictions", "attention_maps"],
"tables": ["validation_results", "layer_stats", "confusion_matrix"],
"histograms": ["gradients", "weights", "activations"],
},
}
@router.get("/experiments/{experiment_id}/scalars/{metric_name}")
async def get_scalar_metric(experiment_id: str, metric_name: str):
"""Get scalar metric as step-value pairs"""
logger_api.info(f"Fetching scalar '{metric_name}' for experiment: {experiment_id}")
if experiment_id not in MOCK_EXPERIMENTS:
raise HTTPException(status_code=404, detail="Experiment not found")
total_steps = MOCK_EXPERIMENTS[experiment_id]["total_steps"]
full_data = generate_sparse_metrics_data(total_events=total_steps)
if metric_name not in full_data:
raise HTTPException(status_code=404, detail=f"Metric '{metric_name}' not found")
# Return as step-value pairs (filter out None values)
data = []
for i, value in enumerate(full_data[metric_name]):
if value is not None:
data.append({"step": i, "value": value})
return {"experiment_id": experiment_id, "metric_name": metric_name, "data": data}
@router.get("/experiments/{experiment_id}/media/{media_name}")
async def get_media_log(experiment_id: str, media_name: str):
"""Get media log entries"""
logger_api.info(f"Fetching media '{media_name}' for experiment: {experiment_id}")
if experiment_id not in MOCK_EXPERIMENTS:
raise HTTPException(status_code=404, detail="Experiment not found")
# Mock media data with real placeholder URLs
media_entries = []
total_steps = MOCK_EXPERIMENTS[experiment_id]["total_steps"]
log_every = 1000 # Log media every 1000 steps
for step in range(0, total_steps, log_every):
media_entries.append(
{
"step": step,
"type": "image",
"url": f"https://picsum.photos/seed/{experiment_id}-{media_name}-{step}/512/512",
"caption": f"{media_name} at step {step}",
}
)
return {
"experiment_id": experiment_id,
"media_name": media_name,
"data": media_entries,
}
@router.get("/experiments/{experiment_id}/tables/{table_name}")
async def get_table_log(experiment_id: str, table_name: str):
"""Get table log entries"""
logger_api.info(f"Fetching table '{table_name}' for experiment: {experiment_id}")
if experiment_id not in MOCK_EXPERIMENTS:
raise HTTPException(status_code=404, detail="Experiment not found")
total_steps = MOCK_EXPERIMENTS[experiment_id]["total_steps"]
log_every = 5000
table_entries = []
for step in range(0, total_steps, log_every):
step_num = step // log_every
table_entries.append(
{
"step": step,
"columns": [
"Sample",
"Image",
"Precision",
"Recall",
"F1-Score",
"Support",
],
"column_types": [
"text",
"image",
"number",
"number",
"number",
"number",
],
"rows": [
[
"Cat",
f"https://picsum.photos/seed/{experiment_id}-cat-{step}/64/64",
0.85 + random.random() * 0.1,
0.80 + random.random() * 0.1,
0.82 + random.random() * 0.1,
120,
],
[
"Dog",
f"https://picsum.photos/seed/{experiment_id}-dog-{step}/64/64",
0.88 + random.random() * 0.1,
0.85 + random.random() * 0.1,
0.86 + random.random() * 0.1,
150,
],
[
"Bird",
f"https://picsum.photos/seed/{experiment_id}-bird-{step}/64/64",
0.75 + random.random() * 0.1,
0.70 + random.random() * 0.1,
0.72 + random.random() * 0.1,
80,
],
],
}
)
return {
"experiment_id": experiment_id,
"table_name": table_name,
"data": table_entries,
}
@router.get("/experiments/{experiment_id}/histograms/{histogram_name}")
async def get_histogram_log(experiment_id: str, histogram_name: str):
"""Get histogram log entries"""
logger_api.info(
f"Fetching histogram '{histogram_name}' for experiment: {experiment_id}"
)
if experiment_id not in MOCK_EXPERIMENTS:
raise HTTPException(status_code=404, detail="Experiment not found")
total_steps = MOCK_EXPERIMENTS[experiment_id]["total_steps"]
log_every = 2000
histogram_entries = []
for step in range(0, total_steps, log_every):
histogram_entries.append(
{
"step": step,
"bins": 50,
"values": [
random.gauss(0, 1 - step / total_steps) for _ in range(10000)
],
}
)
return {
"experiment_id": experiment_id,
"histogram_name": histogram_name,
"data": histogram_entries,
}
@router.get("/experiments/{experiment_id}/histograms/{histogram_name}")
async def get_histogram(
experiment_id: str,
histogram_name: str,
num_values: int = Query(10000, description="Number of data points", le=1000000),
distribution: str = Query(
"normal", description="Distribution type (normal, uniform, exponential)"
),
):
"""Get histogram data"""
logger_api.info(
f"Fetching histogram '{histogram_name}' for experiment: {experiment_id}"
)
if experiment_id not in MOCK_EXPERIMENTS:
raise HTTPException(status_code=404, detail="Experiment not found")
histogram_data = generate_histogram_data(
num_values=num_values, distribution=distribution
)
return histogram_data
@router.get("/experiments/{experiment_id}/tables/{table_name}")
async def get_table(
experiment_id: str,
table_name: str,
num_rows: int = Query(100, description="Number of rows", le=10000),
num_cols: int = Query(6, description="Number of columns", le=50),
):
"""Get table data"""
logger_api.info(f"Fetching table '{table_name}' for experiment: {experiment_id}")
if experiment_id not in MOCK_EXPERIMENTS:
raise HTTPException(status_code=404, detail="Experiment not found")
table_data = generate_table_data(num_rows=num_rows, num_cols=num_cols)
return table_data

View File

@@ -0,0 +1,119 @@
"""Mock data generation API endpoints"""
from fastapi import APIRouter, HTTPException
from pydantic import BaseModel, Field
from typing import List, Optional
from kohakuboard.api.utils.mock_data import (
generate_metrics_data,
generate_histogram_data,
generate_scatter_data,
generate_table_data,
)
from kohakuboard.config import cfg
from kohakuboard.logger import logger_mock
router = APIRouter()
class MockMetricsConfig(BaseModel):
"""Configuration for mock metrics generation"""
steps: int = Field(
default=100000, le=cfg.mock.max_steps, description="Number of steps"
)
metrics: Optional[List[str]] = Field(default=None, description="Metric names")
class MockHistogramConfig(BaseModel):
"""Configuration for mock histogram generation"""
num_values: int = Field(
default=10000, le=1000000, description="Number of data points"
)
distribution: str = Field(default="normal", description="Distribution type")
mean: float = Field(default=0.0, description="Mean value")
std: float = Field(default=1.0, description="Standard deviation")
class MockScatterConfig(BaseModel):
"""Configuration for mock scatter plot generation"""
num_points: int = Field(default=1000, le=100000, description="Number of points")
correlation: float = Field(
default=0.7, ge=-1.0, le=1.0, description="Correlation coefficient"
)
class MockTableConfig(BaseModel):
"""Configuration for mock table generation"""
num_rows: int = Field(default=100, le=10000, description="Number of rows")
num_cols: int = Field(default=6, le=50, description="Number of columns")
@router.post("/mock/metrics")
async def generate_mock_metrics(config: MockMetricsConfig):
"""Generate mock metrics data"""
logger_mock.info(
f"Generating mock metrics: steps={config.steps}, metrics={config.metrics}"
)
try:
data = generate_metrics_data(steps=config.steps, metrics=config.metrics)
return {"metrics": data}
except Exception as e:
logger_mock.error(f"Failed to generate mock metrics: {e}")
raise HTTPException(status_code=500, detail=str(e))
@router.post("/mock/histogram")
async def generate_mock_histogram(config: MockHistogramConfig):
"""Generate mock histogram data"""
logger_mock.info(
f"Generating mock histogram: num_values={config.num_values}, distribution={config.distribution}"
)
try:
data = generate_histogram_data(
num_values=config.num_values,
distribution=config.distribution,
mean=config.mean,
std=config.std,
)
return data
except Exception as e:
logger_mock.error(f"Failed to generate mock histogram: {e}")
raise HTTPException(status_code=500, detail=str(e))
@router.post("/mock/scatter")
async def generate_mock_scatter(config: MockScatterConfig):
"""Generate mock scatter plot data"""
logger_mock.info(
f"Generating mock scatter: num_points={config.num_points}, correlation={config.correlation}"
)
try:
data = generate_scatter_data(
num_points=config.num_points, correlation=config.correlation
)
return {"scatter": data}
except Exception as e:
logger_mock.error(f"Failed to generate mock scatter: {e}")
raise HTTPException(status_code=500, detail=str(e))
@router.post("/mock/table")
async def generate_mock_table(config: MockTableConfig):
"""Generate mock table data"""
logger_mock.info(
f"Generating mock table: rows={config.num_rows}, cols={config.num_cols}"
)
try:
data = generate_table_data(num_rows=config.num_rows, num_cols=config.num_cols)
return data
except Exception as e:
logger_mock.error(f"Failed to generate mock table: {e}")
raise HTTPException(status_code=500, detail=str(e))

View File

@@ -0,0 +1,126 @@
"""Experiment runs API with better data structure"""
from fastapi import APIRouter, HTTPException, Query
from typing import List, Optional
from kohakuboard.api.utils.mock_data import generate_sparse_metrics_data
from kohakuboard.logger import logger_api
router = APIRouter()
# Mock runs data
MOCK_RUNS = {
"run-001": {
"id": "run-001",
"name": "ResNet50 Training",
"status": "running",
"created_at": "2025-01-15T10:00:00Z",
}
}
@router.get("/runs/{run_id}/summary")
async def get_run_summary(run_id: str):
"""Get run summary with all available metrics"""
logger_api.info(f"Fetching summary for run: {run_id}")
if run_id not in MOCK_RUNS:
raise HTTPException(status_code=404, detail="Run not found")
# Generate sample data to get metric names
sample_data = generate_sparse_metrics_data(total_events=100)
return {
"run_id": run_id,
"run_info": MOCK_RUNS[run_id],
"total_events": 100000,
"available_metrics": {
"scalars": [k for k in sample_data.keys() if k != "time"],
"images": ["generated_samples", "confusion_matrix"],
"tables": ["validation_results", "layer_stats"],
},
}
@router.get("/runs/{run_id}/scalars/{metric_name}")
async def get_scalar_values(
run_id: str,
metric_name: str,
start_event: Optional[int] = Query(None),
end_event: Optional[int] = Query(None),
):
"""Get scalar values for a specific metric"""
logger_api.info(f"Fetching scalar '{metric_name}' for run: {run_id}")
if run_id not in MOCK_RUNS:
raise HTTPException(status_code=404, detail="Run not found")
# Generate full dataset
full_data = generate_sparse_metrics_data(total_events=100000)
if metric_name not in full_data:
raise HTTPException(status_code=404, detail=f"Metric '{metric_name}' not found")
start = start_event or 0
end = end_event or len(full_data["time"])
return {
"run_id": run_id,
"metric_name": metric_name,
"time": full_data["time"][start:end],
"values": full_data[metric_name][start:end],
}
@router.get("/runs/{run_id}/images/{image_name}")
async def get_image_log(run_id: str, image_name: str, limit: int = Query(100, le=1000)):
"""Get image log entries"""
logger_api.info(f"Fetching images '{image_name}' for run: {run_id}")
if run_id not in MOCK_RUNS:
raise HTTPException(status_code=404, detail="Run not found")
# Mock image data
images = []
for i in range(min(limit, 10)):
images.append(
{
"step": i * 1000,
"url": f"https://via.placeholder.com/256x256?text=Step+{i * 1000}",
"caption": f"Generated sample at step {i * 1000}",
}
)
return {"run_id": run_id, "image_name": image_name, "images": images}
@router.get("/runs/{run_id}/tables/{table_name}")
async def get_table_log(run_id: str, table_name: str):
"""Get table log with optional image columns"""
logger_api.info(f"Fetching table '{table_name}' for run: {run_id}")
if run_id not in MOCK_RUNS:
raise HTTPException(status_code=404, detail="Run not found")
# Mock table with images
columns = ["ID", "Name", "Score", "Image", "Status"]
column_types = ["number", "text", "number", "image", "text"]
rows = []
for i in range(20):
rows.append(
[
i + 1,
f"Sample_{i + 1}",
round(0.5 + (i / 20) * 0.5, 3),
f"https://via.placeholder.com/64x64?text={i + 1}",
"Pass" if i % 3 == 0 else "Fail",
]
)
return {
"run_id": run_id,
"table_name": table_name,
"columns": columns,
"column_types": column_types,
"rows": rows,
}

View File

@@ -0,0 +1 @@
"""API utilities for KohakuBoard"""

View File

@@ -0,0 +1,351 @@
"""Mock data generation utilities"""
import random
import math
from datetime import datetime, timedelta, timezone
from typing import List, Dict, Any
from kohakuboard.config import cfg
def generate_time_series_data(
steps: int = None,
start_value: float = 1.0,
trend: str = "decreasing",
noise_level: float = None,
smoothness: float = 0.95,
) -> List[float]:
"""
Generate realistic time series data
Args:
steps: Number of data points
start_value: Starting value
trend: 'increasing', 'decreasing', or 'stable'
noise_level: Amount of random noise (0.0 to 1.0)
smoothness: Exponential smoothing factor (0.0 to 1.0)
Returns:
List of values
"""
if steps is None:
steps = cfg.mock.default_steps
if noise_level is None:
noise_level = cfg.mock.default_noise_level
values = []
current_value = start_value
smoothed_value = start_value
for step in range(steps):
# Calculate trend component
progress = step / max(steps - 1, 1)
match trend:
case "decreasing":
trend_value = start_value * math.exp(-3 * progress)
case "increasing":
trend_value = start_value * (1 + 2 * progress)
case "stable":
trend_value = start_value
case "oscillating":
trend_value = start_value * (1 + 0.3 * math.sin(10 * progress))
case _:
trend_value = start_value
# Add noise
noise = random.gauss(0, noise_level * start_value)
# Combine and smooth
current_value = trend_value + noise
smoothed_value = smoothness * smoothed_value + (1 - smoothness) * current_value
values.append(smoothed_value)
return values
def generate_sparse_metrics_data(
total_events: int = 1000, metrics_config: List[Dict[str, Any]] = None
) -> Dict[str, List[Any]]:
"""
Generate sparse multi-metric logging data
Args:
total_events: Total number of logging events
metrics_config: List of metric configurations with logging frequency
Returns:
Dict mapping metric names to lists with None for missing values
"""
if metrics_config is None:
metrics_config = [
{
"name": "train_loss",
"log_every": 1,
"type": "loss",
"start": 2.5,
"noise": 0.08,
},
{
"name": "train_accuracy",
"log_every": 1,
"type": "accuracy",
"start": 0.3,
"noise": 0.015,
},
{
"name": "val_loss",
"log_every": 10,
"type": "loss",
"start": 2.8,
"noise": 0.12,
},
{
"name": "val_accuracy",
"log_every": 10,
"type": "accuracy",
"start": 0.25,
"noise": 0.02,
},
{
"name": "learning_rate",
"log_every": 5,
"type": "lr",
"start": 0.001,
"noise": 0,
},
{"name": "step", "log_every": 1, "type": "step"},
]
result = {"time": list(range(total_events))}
for config in metrics_config:
metric_name = config["name"]
log_every = config["log_every"]
metric_type = config.get("type", "default")
start_val = config.get("start", 1.0)
noise_level = config.get("noise", 0.1)
values = []
value_index = 0
for i in range(total_events):
if i % log_every == 0:
if metric_type == "loss":
base_value = start_val * (0.95 ** (value_index / 10))
value = base_value + random.gauss(0, noise_level)
elif metric_type == "accuracy":
progress = value_index / (total_events / log_every)
base_value = min(0.99, start_val + progress * 0.65)
value = base_value + random.gauss(0, noise_level)
elif metric_type == "lr":
value = start_val * (0.99 ** (value_index / 10))
elif metric_type == "step":
value = i
else:
value = random.random()
values.append(value)
value_index += 1
else:
values.append(None)
result[metric_name] = values
return result
def generate_metrics_data(
steps: int = None, metrics: List[str] = None
) -> List[Dict[str, Any]]:
"""
Generate mock metrics data for line plots
Args:
steps: Number of steps
metrics: List of metric names
Returns:
List of metric series
"""
if steps is None:
steps = cfg.mock.default_steps
if metrics is None:
metrics = ["train_loss", "val_loss", "train_accuracy", "val_accuracy"]
result = []
for metric_name in metrics:
x_values = list(range(steps))
# Configure based on metric type
if "loss" in metric_name:
y_values = generate_time_series_data(
steps=steps,
start_value=2.5 if "train" in metric_name else 2.8,
trend="decreasing",
noise_level=0.05,
smoothness=0.95,
)
elif "accuracy" in metric_name:
y_values = generate_time_series_data(
steps=steps,
start_value=0.3 if "train" in metric_name else 0.25,
trend="increasing",
noise_level=0.02,
smoothness=0.97,
)
else:
y_values = generate_time_series_data(
steps=steps,
start_value=1.0,
trend="stable",
noise_level=0.1,
smoothness=0.9,
)
result.append({"name": metric_name, "x": x_values, "y": y_values})
return result
def generate_histogram_data(
num_values: int = 10000,
distribution: str = "normal",
mean: float = 0.0,
std: float = 1.0,
) -> Dict[str, Any]:
"""
Generate histogram data
Args:
num_values: Number of data points
distribution: 'normal', 'uniform', 'exponential'
mean: Mean value (for normal distribution)
std: Standard deviation (for normal distribution)
Returns:
Histogram data dict
"""
match distribution:
case "normal":
values = [random.gauss(mean, std) for _ in range(num_values)]
case "uniform":
values = [random.uniform(mean - std, mean + std) for _ in range(num_values)]
case "exponential":
values = [random.expovariate(1 / std) for _ in range(num_values)]
case _:
values = [random.gauss(mean, std) for _ in range(num_values)]
return {
"values": values,
"bins": 50,
"name": f"{distribution.capitalize()} Distribution",
}
def generate_scatter_data(
num_points: int = 1000, correlation: float = 0.7
) -> List[Dict[str, Any]]:
"""
Generate scatter plot data
Args:
num_points: Number of data points
correlation: Correlation between x and y (-1.0 to 1.0)
Returns:
List of scatter series
"""
x_values = [random.gauss(0, 1) for _ in range(num_points)]
y_values = [
correlation * x + math.sqrt(1 - correlation**2) * random.gauss(0, 1)
for x in x_values
]
# Generate color values based on distance from origin
colors = [math.sqrt(x**2 + y**2) for x, y in zip(x_values, y_values)]
return [{"name": "Data Points", "x": x_values, "y": y_values, "color": colors}]
def generate_table_data(num_rows: int = 100, num_cols: int = 6) -> Dict[str, Any]:
"""
Generate table data
Args:
num_rows: Number of rows
num_cols: Number of columns
Returns:
Table data dict
"""
columns = [f"Column_{i+1}" for i in range(num_cols)]
rows = []
for i in range(num_rows):
row = [
i + 1, # ID column
f"Item_{i+1}", # Name column
round(random.uniform(0, 100), 2), # Value 1
round(random.uniform(0, 1), 4), # Value 2
random.choice(["A", "B", "C", "D"]), # Category
round(random.uniform(0, 10), 1), # Value 3
]
rows.append(row[:num_cols])
return {"columns": columns, "rows": rows}
def generate_experiment(
experiment_id: str, name: str, steps: int = None, status: str = "completed"
) -> Dict[str, Any]:
"""
Generate a complete experiment with all data
Args:
experiment_id: Experiment ID
name: Experiment name
steps: Number of training steps
status: Experiment status
Returns:
Complete experiment data
"""
if steps is None:
steps = cfg.mock.default_steps
created_at = datetime.now(timezone.utc) - timedelta(hours=random.randint(1, 168))
duration_seconds = random.randint(600, 14400) # 10 min to 4 hours
return {
"id": experiment_id,
"name": name,
"description": f"Mock experiment for testing KohakuBoard visualization",
"status": status,
"total_steps": steps,
"duration": format_duration(duration_seconds),
"created_at": created_at.isoformat(),
"updated_at": (created_at + timedelta(seconds=duration_seconds)).isoformat(),
"config": {
"learning_rate": round(random.uniform(1e-5, 1e-2), 6),
"batch_size": random.choice([16, 32, 64, 128]),
"optimizer": random.choice(["Adam", "SGD", "AdamW"]),
"model": random.choice(["ResNet50", "ViT-B/16", "BERT-base"]),
},
}
def format_duration(seconds: int) -> str:
"""Format duration in human-readable format"""
hours = seconds // 3600
minutes = (seconds % 3600) // 60
secs = seconds % 60
if hours > 0:
return f"{hours}h {minutes}m"
elif minutes > 0:
return f"{minutes}m {secs}s"
else:
return f"{secs}s"

58
src/kohakuboard/config.py Normal file
View File

@@ -0,0 +1,58 @@
"""Configuration for KohakuBoard"""
import os
from dataclasses import dataclass
@dataclass
class AppConfig:
"""Application configuration"""
host: str = "0.0.0.0"
port: int = 48889
api_base: str = "/api"
cors_origins: list = None
def __post_init__(self):
if self.cors_origins is None:
self.cors_origins = ["http://localhost:5175", "http://localhost:28080"]
@dataclass
class MockDataConfig:
"""Mock data generation configuration"""
default_steps: int = 1000
default_metrics_count: int = 4
default_noise_level: float = 0.1
max_steps: int = 100000
max_metrics: int = 50
@dataclass
class Config:
"""Main configuration"""
app: AppConfig
mock: MockDataConfig
@classmethod
def from_env(cls):
"""Load configuration from environment variables"""
return cls(
app=AppConfig(
host=os.getenv("KOHAKU_BOARD_HOST", "0.0.0.0"),
port=int(os.getenv("KOHAKU_BOARD_PORT", "48889")),
api_base=os.getenv("KOHAKU_BOARD_API_BASE", "/api"),
),
mock=MockDataConfig(
default_steps=int(os.getenv("KOHAKU_BOARD_DEFAULT_STEPS", "1000")),
default_metrics_count=int(
os.getenv("KOHAKU_BOARD_DEFAULT_METRICS", "4")
),
default_noise_level=float(os.getenv("KOHAKU_BOARD_NOISE_LEVEL", "0.1")),
),
)
cfg = Config.from_env()

42
src/kohakuboard/logger.py Normal file
View File

@@ -0,0 +1,42 @@
"""Logging configuration for KohakuBoard"""
import logging
import sys
class ColoredFormatter(logging.Formatter):
"""Colored log formatter"""
COLORS = {
"DEBUG": "\033[0;36m", # Cyan
"INFO": "\033[0;32m", # Green
"WARNING": "\033[0;33m", # Yellow
"ERROR": "\033[0;31m", # Red
"CRITICAL": "\033[1;31m", # Bold Red
}
RESET = "\033[0m"
def format(self, record):
log_color = self.COLORS.get(record.levelname, self.RESET)
record.levelname = f"{log_color}{record.levelname}{self.RESET}"
record.name = f"\033[0;35m[{record.name}]{self.RESET}"
return super().format(record)
def get_logger(name: str) -> logging.Logger:
"""Get a colored logger instance"""
logger = logging.getLogger(name)
if not logger.handlers:
handler = logging.StreamHandler(sys.stdout)
formatter = ColoredFormatter("%(name)s %(levelname)s: %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(logging.INFO)
return logger
# Pre-created loggers
logger_api = get_logger("API")
logger_mock = get_logger("MOCK")

54
src/kohakuboard/main.py Normal file
View File

@@ -0,0 +1,54 @@
"""Main FastAPI application for KohakuBoard"""
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from kohakuboard.api.routers import experiments, mock
from kohakuboard.config import cfg
from kohakuboard.logger import logger_api
app = FastAPI(
title="KohakuBoard API",
description="ML Experiment Tracking API",
version="0.1.0",
docs_url=f"{cfg.app.api_base}/docs",
openapi_url=f"{cfg.app.api_base}/openapi.json",
)
# CORS middleware
app.add_middleware(
CORSMiddleware,
allow_origins=cfg.app.cors_origins,
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Register routers
app.include_router(experiments.router, prefix=cfg.app.api_base, tags=["experiments"])
app.include_router(mock.router, prefix=cfg.app.api_base, tags=["mock"])
@app.get("/")
async def root():
"""Root endpoint"""
return {
"name": "KohakuBoard API",
"version": "0.1.0",
"docs": f"{cfg.app.api_base}/docs",
}
@app.get("/health")
async def health():
"""Health check endpoint"""
return {"status": "healthy"}
if __name__ == "__main__":
import uvicorn
logger_api.info(f"Starting KohakuBoard API on {cfg.app.host}:{cfg.app.port}")
uvicorn.run(
"kohakuboard.main:app", host=cfg.app.host, port=cfg.app.port, reload=True
)