cs249r_book/mlsysim/docs/agentic-mcp.qmd

---
title: "Agentic Workflows & MCP"
subtitle: "Using MLSys·im as the physics backend for LLM Agents."
---

The ultimate vision for `mlsysim` is not just to educate humans, but to serve as the **ground-truth physics engine for autonomous AI systems**.

Large Language Models (like Claude 3.5 Sonnet, GPT-4o, or Gemini Pro) are excellent at writing code and structuring YAML, but they frequently hallucinate complex math. If you ask an LLM to calculate the Inter-Token Latency of a 70B model on 8x H100s with PagedAttention, it will confidently guess wrong.

By wrapping `mlsysim` in the **Model Context Protocol (MCP)**, you give your agents the ability to dynamically design hardware clusters, run them through a dimensionally strict physics engine, and interpret the precise bottlenecks to iteratively improve the design.

---

## 1. Using MLSys·im with Claude Desktop (MCP)

We provide a production-ready MCP server that exposes the `mlsysim` engine to Claude Desktop.

### Setup

1. Ensure you have installed `mlsysim` and the `mcp` Python package:
   ```bash
   pip install mlsysim mcp
   ```

2. Open your Claude Desktop configuration file.
   - **macOS:** `~/Library/Application Support/Claude/claude_desktop_config.json`
   - **Windows:** `%APPDATA%\Claude\claude_desktop_config.json`

3. Add the `mlsysim` server:
   ```json
   {
     "mcpServers": {
       "mlsysim": {
         "command": "python3",
         "args": ["-m", "mlsysim.examples.mcp_server"]
       }
     }
   }
   ```

4. Restart Claude Desktop. You will now see a hammer icon 🛠️ indicating the tools are available.

### What to ask Claude

You can now ask Claude questions that require deep hardware simulation:
> *"I need to serve Llama-3 70B. Can you use your mlsysim tool to find out if it fits on a single H100? If it doesn't, design a cluster that does, and tell me the annual TCO."*

Claude will automatically generate the required YAML schema, call the `evaluate_cluster_yaml` tool, see the Out-of-Memory (OOM) failure, correct its design to use 2 nodes, and return the final mathematical truth to you.

---

## 2. The Agentic "Predict-Compute-Reflect" Loop

If you are building your own multi-agent system (using LangChain, AutoGen, or raw Gemini APIs), `mlsysim`'s schema architecture is built specifically for you.

*   **The Input:** Export our schema using `mlsysim schema --type plan`. Feed this JSON schema directly into your LLM's system prompt or tool definition. The LLM instantly knows how to structure the request.
*   **The Execution:** Call `mlsysim eval your_file.yaml --output json` (or use the Python API).
*   **The Feedback:** Because `mlsysim` outputs a strictly-typed, flat JSON dictionary, your agent can easily parse the results. If `f_status == "FAIL"`, the agent reads the `f_summary` (e.g., "OOM: Requires 140 GB but only has 80 GB") and adjusts its design autonomously.

We have included a conceptual Python implementation of this loop in our repository at [`mlsysim/examples/gemini_design_loop.py`](https://github.com/harvard-edge/cs249r_book/blob/dev/mlsysim/examples/gemini_design_loop.py).

---

## 3. Exposed MCP Tools

When running as an MCP server, `mlsysim` exposes the following tools to the connected agent:

| Tool | Description |
|:-----|:-----------|
| `get_schemas` | Return the current JSON schema for valid MLSys·im YAML plans |
| `evaluate_cluster_yaml` | Evaluate a YAML cluster specification through the full 3-lens scorecard (Feasibility, Performance, Macro) |

The agent can call these tools programmatically. The YAML schema can be exported with:

```bash
mlsysim schema --type plan
```

Feed this schema into your agent's system prompt or tool definition so it knows how to structure valid requests.

---

## 4. Troubleshooting

**Claude doesn't show the hammer icon:**
: Make sure you restarted Claude Desktop after editing the config. Check that `python3 /path/to/MLSysBook/mlsysim/examples/mcp_server.py` runs without errors in your terminal.

**Agent gets OOM errors:**
: This is expected behavior — it means the model doesn't fit on the specified hardware. The agent should read the error message and adjust (e.g., add nodes, reduce precision, or pick larger hardware).

**Agent hallucinates hardware specs:**
: Remind the agent to call `get_schemas` and use registry names from the schema/docs rather than inventing specs. The `llms.txt` file at the root of the docs site contains agent-specific guidance.

---

## 5. Why This Matters

The "academic simulator graveyard" is filled with tools that were too hard for humans to compile and too unstructured for machines to use.

By defining `mlsysim` through **strict Pydantic schemas** and standardizing the **22 ML Systems Walls**, we have created an intermediate representation (IR) that both humans and AI agents can understand. In the near future, you will not manually calculate whether a new model architecture is viable; you will ask your Agentic Architect to run 10,000 simulations against the `mlsysim` physics engine while you sleep.