KohakuBoard Client Examples
This directory contains example scripts demonstrating how to use the KohakuBoard client library.
Installation
# Install dependencies (DuckDB backend - recommended, default)
pip install duckdb pandas pillow loguru
# Optional: For Parquet backend (legacy)
pip install pyarrow
# Optional: For Jupyter notebook viewer
pip install jupyter matplotlib
# Or install in development mode
pip install -e .
Important: Windows Users
On Windows, multiprocessing requires the if __name__ == "__main__": guard. All examples include this, but when writing your own scripts, make sure to use:
from kohakuboard.client import Board
def main():
board = Board(name="my_experiment")
# ... your training code ...
if __name__ == "__main__":
main()
Why? Windows uses spawn instead of fork for multiprocessing, which requires the main module to be importable without side effects.
Examples
1. Basic Usage (kohakuboard_basic.py)
Simple logging of scalar metrics without explicit step tracking.
python examples/kohakuboard_basic.py
Features demonstrated:
- Creating a board with configuration
- Logging scalar metrics (non-blocking)
- Auto-increment step tracking
- Flushing and finishing
Output: ./kohakuboard/<board_id>/data/board.duckdb (DuckDB) or metrics.parquet (Parquet)
2. Advanced Usage (kohakuboard_advanced.py)
Complete training simulation with images, tables, and explicit steps.
python examples/kohakuboard_advanced.py
Features demonstrated:
- Context manager usage (automatic cleanup)
- Explicit step tracking (
board.step()) - Logging images from numpy arrays
- Logging tables with class metrics
- Epoch-based training simulation
Output:
./kohakuboard/<board_id>/data/board.duckdb- All data (DuckDB backend)- OR
metrics.parquet,media.parquet,tables.parquet(Parquet backend)
- OR
./kohakuboard/<board_id>/media/*.png- Actual image files
4. Interactive Viewer (view_board_duckdb.ipynb)
Jupyter notebook for exploring board data interactively.
# From project root
jupyter notebook examples/view_board_duckdb.ipynb
Features:
- Auto-finds latest board
- Supports both DuckDB and Parquet backends
- SQL queries with results
- Plot metrics by global_step
- Display images inline
- Export to CSV
- Database statistics
3. Explicit Steps Deep Dive (kohakuboard_explicit_steps.py)
Detailed demonstration of the dual-step tracking system.
python examples/kohakuboard_explicit_steps.py
Features demonstrated:
- Difference between
step(auto-increment) andglobal_step(explicit) - How to use
global_stepfor epoch tracking - How all batches in an epoch share the same
global_step
Key Concept:
auto_step (column: step):
- Increments automatically on every log call
- Used for: Timeline, sequential ordering
global_step (column: global_step):
- Controlled explicitly via board.step()
- Used for: Epochs, checkpoints, phases
- All logs between step() calls share same global_step
Directory Structure After Running Examples
DuckDB Backend (default):
kohakuboard/
└── <board_id>/ # e.g., 20250126_153045_a1b2c3d4
├── metadata.json # Board info, config, backend type
├── data/
│ └── board.duckdb # Single database (metrics, media, tables)
├── media/ # Actual media files
│ ├── predictions_0_00000000_abc123.png
│ └── ...
└── logs/
├── output.log # Captured stdout/stderr
└── writer.log # Writer process logs
Parquet Backend (legacy):
kohakuboard/
└── <board_id>/
├── metadata.json
├── data/
│ ├── metrics.parquet # Scalar metrics
│ ├── media.parquet # Image metadata
│ └── tables.parquet # Table data
├── media/ # Actual media files
└── logs/
Using the Data
Option 1: Jupyter Notebook (Recommended)
Use the interactive viewer notebook:
jupyter notebook examples/view_board_duckdb.ipynb
Features:
- Auto-finds latest board
- SQL queries with visualization
- Display images inline
- Export to CSV
Option 2: Python Script (DuckDB)
import duckdb
# Connect to board
conn = duckdb.connect("kohakuboard/<board_id>/data/board.duckdb", read_only=True)
# Query metrics with SQL
df = conn.execute("""
SELECT * FROM metrics
WHERE global_step = 0
ORDER BY step
""").df()
# Get all available metrics
metrics = conn.execute("SELECT * FROM metrics").df()
# Query media
media = conn.execute("SELECT * FROM media WHERE type = 'image'").df()
# Close connection
conn.close()
Option 3: Python Script (Parquet Backend)
import pandas as pd
# Read metrics
df = pd.read_parquet("kohakuboard/<board_id>/data/metrics.parquet")
# Query by global_step
epoch_0 = df[df["global_step"] == 0]
# Read media
media_df = pd.read_parquet("kohakuboard/<board_id>/data/media.parquet")
Best Practices
-
Just create the board and let atexit handle cleanup:
# Create board at the start of your training script board = Board(name="my_experiment", config={"lr": 0.001}) # Your existing training loop - no changes needed! for epoch in range(100): board.step() for batch in train_loader: board.log(loss=...) # Board automatically finishes on program exit (atexit hook) # Or call board.finish() explicitly if neededWhy not
with? - Integrating into existing training loops is easier without restructuring your code into awithblock. -
Use explicit steps for epochs/phases:
for epoch in range(100): board.step() # Set global_step for batch in train_loader: board.log(loss=...) # All batches share epoch's global_step -
Flush before long-running operations:
board.log(checkpoint="saving...") board.flush() # Ensure log is written save_checkpoint(model) # Long operation -
Use descriptive metric names:
# Good board.log(train_loss=0.5, val_accuracy=0.95) # Avoid board.log(loss=0.5, acc=0.95) -
Log images selectively:
# Don't log every batch if batch_idx % 100 == 0: board.log_images("samples", images)
Performance Notes
- Non-blocking: All
board.log*()calls are non-blocking (return immediately) - Background writer: Separate process handles all disk I/O
- DuckDB (default):
- True incremental append (no read overhead!)
- Automatic compression (RLE, bit-packing, dictionary)
- ACID transactions
- Single file storage
- Parquet (legacy):
- Read-concat-write (slower)
- Manual compression (Snappy)
- Multiple files
- Deduplication: Images are content-addressed (duplicate images share storage)
Troubleshooting
Board doesn't finish gracefully:
- Always call
board.finish()or use context manager - Check
logs/writer.logfor errors
Images not saving:
- Make sure PIL is installed:
pip install pillow - Check
logs/writer.logfor conversion errors
High memory usage:
- Reduce queue size:
board.queue = mp.Queue(maxsize=1000) - Flush more frequently:
board.flush()
DuckDB errors:
- Install duckdb:
pip install duckdb - Check file permissions on board.duckdb
Parquet read errors:
- Install pyarrow:
pip install pyarrow - Check file permissions
Want to use Parquet instead of DuckDB:
board = Board(name="my_experiment", backend="parquet")
Backend Comparison
| Feature | DuckDB (default) | Parquet (legacy) |
|---|---|---|
| Incremental Append | ✅ True (no read!) | ❌ Read-concat-write |
| Compression | ✅ Automatic (adaptive) | ✅ Manual (Snappy/ZSTD) |
| Schema Evolution | ✅ ALTER TABLE (instant) | ❌ Full file rewrite |
| Query | ✅ SQL (native) | ✅ SQL (via DuckDB read) |
| Write Performance | ✅ Fast (append-only) | ❌ Slow (rewrite entire file) |
| Files | ✅ Single .duckdb | ⚠️ 3 separate files |
| File Size | ✅ Smaller (better compression) | ✅ Small (columnar) |
| Portability | ⚠️ DuckDB-specific | ✅ Standard Parquet |
Note: Both can be queried with SQL (DuckDB can read Parquet files). The key difference is write performance - DuckDB uses true incremental append, while Parquet requires reading and rewriting the entire file.
Recommendation: Use DuckDB (default) for better write performance and automatic compression. Use Parquet only if you need compatibility with other tools that can't read DuckDB files.
Next Steps
- View boards in KohakuBoard UI
- Use Jupyter notebook for interactive exploration
- Query with SQL (DuckDB backend)
- Export to CSV, JSON
- Sync to remote storage (coming soon)
- Share boards with team (coming soon)