Switch mlsysim from CC-BY-NC-SA-4.0 to Apache-2.0 for the Python package. Creative Commons itself advises against CC licenses for software: they do not cover patent grants, source/object distinctions, or derivative-work definitions the way a software license does, and the NonCommercial clause makes the package unusable in any corporate setting — defeating the goal of a teaching tool intended for working engineers. Apache-2.0 preserves full copyright-holder rights to commercialize (dual-license future versions, sell commercial editions, offer paid support/SaaS), while enabling the broad adoption appropriate for a textbook companion. The textbook prose remains CC-BY-NC-SA-4.0. The docs site footer is updated to state both licenses explicitly so readers do not conflate the two. - mlsysim/LICENSE.md : full Apache-2.0 text with copyright - mlsysim/pyproject.toml : license = "Apache-2.0" + OSI classifier - mlsysim/CITATION.cff : license: Apache-2.0 - mlsysim/RELEASE_NOTES_0.1.0 : license line split (code vs docs) - mlsysim/paper/paper.tex : "MIT license" → "Apache-2.0 license" - mlsysim/docs/config/_quarto-html.yml : footer shows both licenses
5.0 KiB
MLSys·im 0.1.0 — Initial Release
Release date: 2026-04-01
MLSys·im is a first-principles analytical engine for predicting performance, cost, and carbon footprint of ML systems. It is the computational companion to the Machine Learning Systems textbook.
This is the first public release. The API surface, the 22-wall taxonomy, and the solver portfolio are all considered stable for the 0.1.x line.
Install
pip install mlsysim
Verify the install:
python -c "import mlsysim; print(mlsysim.__version__)"
mlsysim eval Llama3_8B H100 --batch-size 32
Requires Python 3.10+. No GPU required — the engine computes from closed-form equations.
Five-line quickstart
import mlsysim
from mlsysim import Engine
profile = Engine.solve(
model = mlsysim.Models.ResNet50,
hardware = mlsysim.Hardware.Cloud.A100,
batch_size = 1,
precision = "fp16",
)
print(f"Bottleneck: {profile.bottleneck}") # → Memory
print(f"Latency: {profile.latency.to('ms'):~.2f}") # → 0.54 ms
print(f"Throughput: {profile.throughput:.0f}") # → 1843 / second
Highlights
Core framework
- 22-wall taxonomy organizing every constraint that bounds ML system performance, across six domains (Node, Data, Algorithm, Fleet, Ops, Analysis).
- 26 analytical solvers (
Model,Solver,Optimizerclasses) covering all 22 walls. - Pint unit system with dimensional analysis enforced at runtime.
- TraceableConstant pattern — every default carries a citation.
- Pipeline composer for chaining solvers with
explain()andrun(). - 3-tier evaluation scorecard: Feasibility → Performance → Macro/Economics.
- Design Space Exploration (DSE) engine with declarative search and constraint evaluation.
Hardware Registry (15+ accelerators)
V100, A100, H100, H200, B200, GB200 NVL72, MI300X, TPUv5p, T4, Cerebras CS-3, Jetson Orin NX, ESP32-S3, nRF52840, Himax WE-I Plus, DGX Spark, MacBook M3 Max, iPhone 15 Pro, Pixel 8.
Full precision support: FP32, TF32, BF16, FP16, FP8, INT8, INT4. Multi-level memory hierarchy with HBM, SRAM, and Flash (TinyML). All specifications verified against manufacturer datasheets.
Model Registry
GPT-2/3/4, Llama-2/3 (7B/8B/70B), BERT Base/Large, ResNet-50, MobileNetV2, AlexNet, Mamba, Stable Diffusion v1.5, DS-CNN, WakeVision. HuggingFace importer included for arbitrary Transformer workloads.
Analytical models
SingleNodeModel, NetworkRooflineModel, EfficiencyModel, ForwardModel, ServingModel, ContinuousBatchingModel, WeightStreamingModel, TailLatencyModel, DataModel, TransformationModel, TopologyModel, ScalingModel, InferenceScalingModel, CompressionModel, DistributedModel, ReliabilityModel, OrchestrationModel, EconomicsModel, SustainabilityModel, CheckpointModel, ResponsibleEngineeringModel, SensitivitySolver, SynthesisSolver, ParallelismOptimizer, BatchingOptimizer, PlacementOptimizer.
CLI
mlsysim eval Evaluate the analytical physics of an ML system (YAML or CLI flags)
mlsysim serve Evaluate LLM serving (prefill + decode)
mlsysim optimize Search the design space for optimal configurations
mlsysim zoo Explore the built-in registries
mlsysim audit Profile your local hardware against the Iron Law
mlsysim schema Export the JSON Schema for the mlsys.yaml configuration file
Testing
367 tests, 100% pass rate. Coverage includes formula unit tests with known answers, full solver suite, physics-bound validation across all registry hardware, wall-taxonomy completeness, pipeline composition, and three optimization backends (exhaustive, OR-tools, scipy).
Documentation
- Site: mlsysbook.ai/mlsysim/
- Tutorials: roofline, memory wall, KV cache, scaling to 1000 GPUs, geography, sensitivity, full-stack audit
- API reference: mlsysbook.ai/mlsysim/api/
Known limitations & gotchas
- First-order analytical model. Predictions are typically within 15–30% of measured throughput on well-optimized workloads. Use MLSys·im to compare options and identify bottlenecks; validate with empirical benchmarks before committing to a production SLA. See
accuracy.qmdfor full validation against MLPerf v4.0. - Slide PDFs. Many tutorials cross-link to lecture slides at
github.com/harvard-edge/cs249r_book/releases/download/slides-latest/*.pdf. Theslides-latestrelease tag is not yet published; these links will resolve once the slides ship. - Hosted notebook launchers. Google Colab and Binder buttons are planned but not wired up for 0.1.0. Tutorials run locally on any Python 3.10+ environment.
Project links
- Source: github.com/harvard-edge/cs249r_book/tree/dev/mlsysim
- Issues: github.com/harvard-edge/cs249r_book/issues
- License: Apache-2.0 (code) · CC-BY-NC-SA-4.0 (documentation)
- Citation: see
CITATION.cff