Updates the requirements on [rich](https://github.com/Textualize/rich) to permit the latest version. - [Release notes](https://github.com/Textualize/rich/releases) - [Changelog](https://github.com/Textualize/rich/blob/master/CHANGELOG.md) - [Commits](https://github.com/Textualize/rich/compare/v13.0.0...v15.0.0) --- updated-dependencies: - dependency-name: rich dependency-version: 15.0.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Note
📌 Early release (2026)
MLSys·im shipped with the 2026 MLSysBook refresh. The modeling platform, APIs, and lab integrations are actively iterated as we harden the simulator and teaching workflows.
Feedback — GitHub issues or pull requests.
🚀 MLSys·im: The Modeling Platform
The physics-grounded analytical simulator powering the Machine Learning Systems ecosystem.
Provides a unified "Single Source of Truth" (SSoT) for modeling systems from sub-watt microcontrollers to exaflop-scale global fleets.
🏗 The 5-Layer Analytical Stack
mlsysim implements a "Progressive Lowering" architecture, separating high-level workloads from the physical infrastructure that executes them.
| Layer | Domain | Key Components |
|---|---|---|
| Layer A | Workload Representationmlsysim.models |
FLOPs, parameters, and intensity. e.g., Llama3_70B, ResNet50 |
| Layer B | Hardware Registrymlsysim.hardware |
Concrete specs for real-world silicon. e.g., H100, TPUv5p, Jetson |
| Layer C | Infrastructuremlsysim.infra |
Grid profiles and datacenter sustainability. e.g., PUE, Carbon Intensity, WUE |
| Layer D | Systems & Topologymlsysim.systems |
Fleet configurations and network fabrics. e.g., Doorbell, AutoDrive Scenarios |
| Layer E | Execution & Resolversmlsysim.core.solver |
The 3-tier math engine: Models, Solvers, and Optimizers (Design space search). |
🚀 Quick Usage: The Agent-Ready CLI
mlsysim is a first-principles analytical calculator for ML systems. It provides a terminal UI for humans and a strict JSON API for CI/CD pipelines and AI agents.
Accuracy note: mlsysim predictions are typically within 2–5× of measured performance for well-characterized workloads. For production capacity planning, always validate with benchmarks. This tool formalizes the back-of-envelope math that senior engineers do intuitively — it is not a substitute for profiling or load testing.
1. Explore the Registry (The Zoo)
Discover built-in hardware, models, and infrastructure without reading source code:
mlsysim zoo hardware
mlsysim zoo models
2. Quick Evaluation (CLI Flags)
Evaluate the physics of a workload on a specific hardware node instantly: mlsysim eval Llama3_8B H100 --batch-size 32
3. Deep Simulation (Infrastructure as Code)
Define your entire cluster and SLA constraints in a declarative mlsys.yaml file:
# example_cluster.yaml
version: "1.0"
name: "Llama-3 70B training audit"
workload:
name: "Llama3_70B"
batch_size: 4096
hardware:
name: "H100"
nodes: 64
ops:
region: "Quebec"
duration_days: 14.0
constraints:
assert:
- metric: "performance.latency"
max: 50.0
Then compile and evaluate the 3-lens scorecard (Feasibility, Performance, Macro): mlsysim eval example_cluster.yaml
4. CI/CD & Agentic Automation
Every command supports strict, schema-validated JSON output. If an assert constraint is violated, the CLI returns a semantic Exit Code 3.
# Export the JSON Schema for your IDE or AI Agent
mlsysim schema > schema.json
# Run an evaluation in a CI pipeline
tco=$(mlsysim --output json eval example_cluster.yaml | jq .m_tco_usd)
5. Design Space Search (Optimizers)
Use the Tier 3 Engineering Engine to automatically find the optimal configuration:
mlsysim optimize parallelism example_cluster.yaml
mlsysim optimize placement example_cluster.yaml --carbon-tax 150
🛡 Stability & Integrity
Because this core powers a printed textbook, we enforce strict Invariant Verification. Every physical constant is traceable to a primary source (datasheet or paper), and dimensional integrity is enforced via pint.
⚠️ What This Tool Does Not Model
MLSysim is an analytical hardware calculator, not a production deployment simulator. The 22 walls model physical and economic constraints that bound ML system performance. Several critical production concerns are deliberately out of scope:
| Concern | Why it matters | Where to learn more |
|---|---|---|
| Data drift / distribution shift | The #1 cause of production ML failures — model accuracy degrades silently as input distributions change | Sculley et al. (2015), "Hidden Technical Debt in ML Systems" |
| Model versioning & rollback | Production requires running multiple versions, A/B testing, and safe rollback | Huyen (2022), Designing Machine Learning Systems |
| Monitoring & observability | You cannot manage what you cannot measure — prediction distributions, latency percentiles, error rates | Google SRE Book (2016); Huyen (2022) |
| Feature store freshness | Stale features silently degrade real-time models (recommendations, fraud detection) | Uber Michelangelo (2017) |
| Software bugs & misconfigurations | Most outages are caused by software, not hardware | Barroso et al. (2018) |
| Human factors | Team velocity, on-call burden, and organizational alignment often dominate outcomes | Brooks (1975), The Mythical Man-Month |
Passing all 22 walls is necessary but not sufficient for a successful production deployment.
Students using this tool should understand that infrastructure physics (what mlsysim models) is one dimension of a multi-dimensional engineering challenge.
📖 How to Cite
If you use mlsysim in your research or teaching, please cite:
@software{mlsysim2026,
author = {Janapa Reddi, Vijay},
title = {{MLSys$\cdot$im}: First-Principles Infrastructure Modeling for Machine Learning Systems},
year = {2026},
url = {https://mlsysbook.ai/mlsysim},
version = {0.1.1},
institution = {Harvard University}
}
🛠 Installation
MLSys·im is designed to be highly modular. Install only what you need:
# Core physics engine only (fastest, smallest footprint)
pip install mlsysim
# The CLI and YAML support are included in the base package.
# The [cli] extra is retained as a backward-compatible no-op.
pip install "mlsysim[cli]"
# Install with dependencies for interactive labs (Marimo, Plotly)
pip install "mlsysim[labs]"
🐍 Python API Usage
The framework is just as powerful inside a Python script or Jupyter Notebook. The SystemEvaluator provides a clean, unified entry point for full-stack analysis:
import mlsysim
# 1. Define the scenario
model = mlsysim.Models.Language.Llama3_8B
hardware = mlsysim.Hardware.Cloud.H100
# 2. Run the evaluation
evaluation = mlsysim.SystemEvaluator.evaluate(
scenario_name="Llama-3 8B on H100",
model_obj=model,
hardware_obj=hardware,
batch_size=32,
precision="fp16",
efficiency=0.45
)
# 3. View the beautifully formatted scorecard
print(evaluation.scorecard())
Efficiency Parameter Guide
The efficiency parameter (0.0–1.0) captures the gap between peak hardware performance and what your software stack actually achieves. Use these guidelines:
| Scenario | Efficiency | Rationale |
|---|---|---|
| Training (Megatron-LM, large Transformer) | 0.40–0.55 | Well-optimized GEMM + FlashAttention |
| Training (PyTorch eager, small model) | 0.08–0.15 | Kernel launch overhead dominates |
| Inference decode, batch=1 | 0.01–0.05 | Memory-bound; compute nearly idle |
| Inference decode, batch=32+ | 0.15–0.35 | Batch amortizes weight loading |
| Inference prefill, long context | 0.30–0.50 | Compute-bound GEMM + attention |
| TinyML (TFLite Micro on ESP32) | 0.05–0.15 | Interpreter overhead, no tensor cores |
Contributors
Thanks to these wonderful people for helping improve MLSys·im!
Legend: 🪲 Bug Hunter · ⚡ Code Warrior · 📚 Documentation Hero · 🎨 Design Artist · 🧠 Idea Generator · 🔎 Code Reviewer · 🧪 Test Engineer · 🛠️ Tool Builder
Vijay Janapa Reddi 🧑💻 🎨 ✍️ 🧠 maintenance |
Peter Koellner 🪲 ✍️ |
Rocky 🪲 🧑💻 |
Zeljko Hrcek 🧑💻 |
Recognize a contributor: Comment on any issue or PR:
@all-contributors please add @username for code, doc, ideas, or bug
License
Code: Apache License 2.0 — free for commercial and non-commercial use, with patent grant and attribution requirement.
Documentation and textbook prose: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 (CC-BY-NC-SA-4.0) — the tutorials and prose on mlsysbook.ai/mlsysim are part of the Machine Learning Systems textbook and carry its license.
The two licenses are intentionally separate: the Python package is permissively licensed so engineers and researchers can use it anywhere (including commercially), while the textbook prose retains its non-commercial protection to prevent republication as a derivative textbook.
Copyright © 2026 Vijay Janapa Reddi and MLSys·im contributors.