mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-08 18:01:20 -05:00
fix: close issues 1531, 1532, 1502, 1508
fix(labs): bump mlsysim wheel ref from 0.1.0 to 0.1.1 in all 33 labs Closes #1531. pyproject.toml was bumped to 0.1.1 in PR #1523 but the micropip.install() URLs in every lab still pointed to 0.1.0, causing TestWheelConsistency and the WASM smoke test to fail on every PR. fix(ci): add .codespellrc to suppress false positive spell check failures Closes #1532. Skips vendored JS in socratiq/src_shadow and whitelists legitimate technical terms: clos (Clos network topology), fpr (False Positive Rate), rin (ring buffer variable), ans, fo, curren (contributor name). fix(staffml): correct tinyml-0384 KWS question bad distractor and napkin math Closes #1502. Option 3 distractor used a valid throughput setup (4 x 80 = 320 MFLOPS) then broke it with a false MHz=MFLOPS equivalence. Replaced with an unambiguously wrong distractor. Napkin math now shows both solution paths (latency: 80/336 = 238 ms, and throughput: 4x80 = 320 MFLOPS < 336 MFLOPS). common_mistake updated to flag the MHz vs MFLOPS confusion. fix(tinytorch): strip solution blocks when creating student notebooks Closes #1508. tito module start created notebooks from src/ verbatim via jupytext, including all working implementations between BEGIN/END SOLUTION markers. _create_module_from_src now strips those blocks and replaces them with raise NotImplementedError stubs before conversion, so students receive blank scaffolding instead of solved code. Verified on module 01: 13 solution blocks stripped, 13 stubs inserted.
This commit is contained in:
committed by
Vijay Janapa Reddi
parent
61fcc6eec0
commit
ff870d5f30
@@ -25,24 +25,25 @@ details:
|
||||
model's workload (80 MFLOPs) by the MCU's performance (336 MFLOPS). This gives us an inference time
|
||||
of ~238.1 milliseconds. Since 238 ms is less than the 250 ms deadline imposed by the sliding window,
|
||||
the system is viable and will not fall behind. It has a slack of about 12 ms per inference cycle.
|
||||
common_mistake: Engineers often confuse the audio clip's *duration* (1000ms) with the real-time processing
|
||||
*deadline*. The deadline is dictated by the data arrival rate (the window stride, 250ms). If processing
|
||||
one window takes longer than the time until the next window arrives, the input buffer will grow infinitely,
|
||||
and the system will fail its real-time constraint.
|
||||
napkin_math: 'Cortex-M4 peak performance: ~336 MFLOPS (168 MHz × 2 FLOPS/cycle). Model workload: 80
|
||||
MFLOPs. Inference time = 80 / 336 = **0.2381 seconds = 238.1 ms**. Sliding window stride: 250 ms.
|
||||
238 ms < 250 ms → **deadline met**, but only 12 ms slack (4.8%). This is dangerously tight. Slack
|
||||
must absorb: MFCC feature extraction (~5-10 ms), interrupt handling (~1 ms), DMA buffer management
|
||||
(~1 ms). With only 12 ms slack, any of these could push past the deadline. Mitigations: (1) INT8 quantization
|
||||
could 2-4× speedup via SIMD, (2) reduce model to ~60 MFLOPs, (3) increase stride to 500 ms (lower
|
||||
responsiveness but 2× more headroom).'
|
||||
common_mistake: 'Two common errors. First: confusing the audio clip duration (1000 ms) with the real-time
|
||||
deadline. The deadline is the window stride (250 ms) — the rate at which new windows arrive. Second:
|
||||
confusing MHz (clock frequency) with MFLOPS (floating-point throughput). 168 MHz is not 168 MFLOPS.
|
||||
The Cortex-M4 executes ~2 FLOPS per cycle, giving 336 MFLOPS. Mixing these units produces nonsense answers.'
|
||||
napkin_math: 'Cortex-M4 peak performance: 168 MHz × 2 FLOPS/cycle = **336 MFLOPS**. Model workload:
|
||||
80 MFLOPs. **Approach 1 (latency):** Inference time = 80 MFLOPs / 336 MFLOPS = 0.2381 s = **238 ms**.
|
||||
238 ms < 250 ms stride → deadline met, 12 ms slack. **Approach 2 (throughput):** Windows per second
|
||||
= 1000 ms / 250 ms = 4. Required throughput = 4 × 80 MFLOPs = **320 MFLOPS**. 320 MFLOPS < 336 MFLOPS
|
||||
→ deadline met, 5% headroom. Both approaches agree. The 12 ms slack must absorb: MFCC feature extraction
|
||||
(~5-10 ms), interrupt handling (~1 ms), DMA buffer management (~1 ms). This is dangerously tight.
|
||||
Mitigations: (1) INT8 quantization for 2-4x SIMD speedup, (2) reduce model to ~60 MFLOPs, (3) increase
|
||||
stride to 500 ms for 2x more headroom.'
|
||||
options:
|
||||
- Yes. The MCU takes ~238 ms per inference (80 MFLOPs / 336 MFLOPS), which is less than the 250 ms deadline
|
||||
from the window stride.
|
||||
- Yes, easily. The MCU's inference time of ~238 ms is much shorter than the 1000 ms audio clip, leaving
|
||||
over 750 ms of slack.
|
||||
- No. The MCU is too slow. The required processing time is 4.2 seconds (336 MFLOPs / 80 MFLOPs), which
|
||||
badly misses the 250 ms deadline.
|
||||
- No. The MCU is too slow. The required processing time is 4.2 seconds (80 MFLOPs / 20 MFLOPS at 10 MHz
|
||||
effective throughput), which badly misses the 250 ms deadline.
|
||||
- No. The system needs to process 4 windows per second (1000ms / 250ms), requiring 320 MFLOPS (4 * 80),
|
||||
but the MCU only runs at 168 MHz.
|
||||
correct_index: 0
|
||||
|
||||
@@ -394,15 +394,59 @@ class ModuleWorkflowCommand(BaseCommand):
|
||||
|
||||
Uses the same conversion logic as 'tito src export' but only creates
|
||||
the student-facing notebook, without exporting to the tinytorch package.
|
||||
Solution blocks (### BEGIN SOLUTION ... ### END SOLUTION) are stripped
|
||||
so students receive stubs, not working implementations.
|
||||
"""
|
||||
import tempfile
|
||||
import shutil
|
||||
from ..export_utils import convert_py_to_notebook
|
||||
|
||||
src_path = self.config.project_root / "src" / module_name
|
||||
if not src_path.exists():
|
||||
return False
|
||||
|
||||
# Convert src/*.py to modules/*.ipynb using jupytext
|
||||
return convert_py_to_notebook(src_path, self.venv_path, self.console)
|
||||
src_file = src_path / f"{module_name}.py"
|
||||
if not src_file.exists():
|
||||
return False
|
||||
|
||||
# Strip solution blocks before passing to jupytext
|
||||
stripped = self._strip_solutions(src_file.read_text(encoding="utf-8"))
|
||||
|
||||
# Write stripped source to a temp dir that mirrors the expected layout
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_module_dir = Path(tmp) / module_name
|
||||
tmp_module_dir.mkdir()
|
||||
tmp_src = tmp_module_dir / f"{module_name}.py"
|
||||
tmp_src.write_text(stripped, encoding="utf-8")
|
||||
|
||||
# Copy any sibling assets (data files, images) the notebook may reference
|
||||
for item in src_path.iterdir():
|
||||
if item.name != f"{module_name}.py":
|
||||
dest = tmp_module_dir / item.name
|
||||
if item.is_dir():
|
||||
shutil.copytree(item, dest)
|
||||
else:
|
||||
shutil.copy2(item, dest)
|
||||
|
||||
return convert_py_to_notebook(tmp_module_dir, self.venv_path, self.console)
|
||||
|
||||
@staticmethod
|
||||
def _strip_solutions(source: str) -> str:
|
||||
"""Replace BEGIN/END SOLUTION blocks with a NotImplementedError stub."""
|
||||
lines = source.splitlines(keepends=True)
|
||||
result = []
|
||||
in_solution = False
|
||||
for line in lines:
|
||||
stripped = line.strip()
|
||||
if stripped == "### BEGIN SOLUTION":
|
||||
in_solution = True
|
||||
indent = line[: len(line) - len(line.lstrip())]
|
||||
result.append(f"{indent}raise NotImplementedError('Your implementation here')\n")
|
||||
elif stripped == "### END SOLUTION":
|
||||
in_solution = False
|
||||
elif not in_solution:
|
||||
result.append(line)
|
||||
return "".join(result)
|
||||
|
||||
def _get_milestone_for_module(self, module_num: int) -> Optional[tuple]:
|
||||
"""Get the milestone this module contributes to."""
|
||||
|
||||
Reference in New Issue
Block a user