Files
cs249r_book/mlsysim/docs/agentic-mcp.qmd
Vijay Janapa Reddi 1eb30f5f86 fix(mlsysim): harden release QA and paper artifacts
Align the MLSys·im code, docs, paper, website, workflows, and lab wheel for the 0.1.1 release. This also fixes runtime/API issues found during release review and prepares the paper PDF plus archive package.
2026-04-25 10:06:01 -04:00

101 lines
4.9 KiB
Plaintext

---
title: "Agentic Workflows & MCP"
subtitle: "Using MLSys·im as the physics backend for LLM Agents."
---
The ultimate vision for `mlsysim` is not just to educate humans, but to serve as the **ground-truth physics engine for autonomous AI systems**.
Large Language Models (like Claude 3.5 Sonnet, GPT-4o, or Gemini Pro) are excellent at writing code and structuring YAML, but they frequently hallucinate complex math. If you ask an LLM to calculate the Inter-Token Latency of a 70B model on 8x H100s with PagedAttention, it will confidently guess wrong.
By wrapping `mlsysim` in the **Model Context Protocol (MCP)**, you give your agents the ability to dynamically design hardware clusters, run them through a dimensionally strict physics engine, and interpret the precise bottlenecks to iteratively improve the design.
---
## 1. Using MLSys·im with Claude Desktop (MCP)
We provide a production-ready MCP server that exposes the `mlsysim` engine to Claude Desktop.
### Setup
1. Ensure you have installed `mlsysim` and the `mcp` Python package:
```bash
pip install mlsysim mcp
```
2. Open your Claude Desktop configuration file.
- **macOS:** `~/Library/Application Support/Claude/claude_desktop_config.json`
- **Windows:** `%APPDATA%\Claude\claude_desktop_config.json`
3. Add the `mlsysim` server:
```json
{
"mcpServers": {
"mlsysim": {
"command": "python3",
"args": ["-m", "mlsysim.examples.mcp_server"]
}
}
}
```
4. Restart Claude Desktop. You will now see a hammer icon 🛠️ indicating the tools are available.
### What to ask Claude
You can now ask Claude questions that require deep hardware simulation:
> *"I need to serve Llama-3 70B. Can you use your mlsysim tool to find out if it fits on a single H100? If it doesn't, design a cluster that does, and tell me the annual TCO."*
Claude will automatically generate the required YAML schema, call the `evaluate_cluster_yaml` tool, see the Out-of-Memory (OOM) failure, correct its design to use 2 nodes, and return the final mathematical truth to you.
---
## 2. The Agentic "Predict-Compute-Reflect" Loop
If you are building your own multi-agent system (using LangChain, AutoGen, or raw Gemini APIs), `mlsysim`'s schema architecture is built specifically for you.
* **The Input:** Export our schema using `mlsysim schema --type plan`. Feed this JSON schema directly into your LLM's system prompt or tool definition. The LLM instantly knows how to structure the request.
* **The Execution:** Call `mlsysim eval your_file.yaml --output json` (or use the Python API).
* **The Feedback:** Because `mlsysim` outputs a strictly-typed, flat JSON dictionary, your agent can easily parse the results. If `f_status == "FAIL"`, the agent reads the `f_summary` (e.g., "OOM: Requires 140 GB but only has 80 GB") and adjusts its design autonomously.
We have included a conceptual Python implementation of this loop in our repository at [`mlsysim/examples/gemini_design_loop.py`](https://github.com/harvard-edge/cs249r_book/blob/dev/mlsysim/examples/gemini_design_loop.py).
---
## 3. Exposed MCP Tools
When running as an MCP server, `mlsysim` exposes the following tools to the connected agent:
| Tool | Description |
|:-----|:-----------|
| `get_schemas` | Return the current JSON schema for valid MLSys·im YAML plans |
| `evaluate_cluster_yaml` | Evaluate a YAML cluster specification through the full 3-lens scorecard (Feasibility, Performance, Macro) |
The agent can call these tools programmatically. The YAML schema can be exported with:
```bash
mlsysim schema --type plan
```
Feed this schema into your agent's system prompt or tool definition so it knows how to structure valid requests.
---
## 4. Troubleshooting
**Claude doesn't show the hammer icon:**
: Make sure you restarted Claude Desktop after editing the config. Check that `python3 /path/to/MLSysBook/mlsysim/examples/mcp_server.py` runs without errors in your terminal.
**Agent gets OOM errors:**
: This is expected behavior — it means the model doesn't fit on the specified hardware. The agent should read the error message and adjust (e.g., add nodes, reduce precision, or pick larger hardware).
**Agent hallucinates hardware specs:**
: Remind the agent to call `get_schemas` and use registry names from the schema/docs rather than inventing specs. The `llms.txt` file at the root of the docs site contains agent-specific guidance.
---
## 5. Why This Matters
The "academic simulator graveyard" is filled with tools that were too hard for humans to compile and too unstructured for machines to use.
By defining `mlsysim` through **strict Pydantic schemas** and standardizing the **22 ML Systems Walls**, we have created an intermediate representation (IR) that both humans and AI agents can understand. In the near future, you will not manually calculate whether a new model architecture is viable; you will ask your Agentic Architect to run 10,000 simulations against the `mlsysim` physics engine while you sleep.