mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-02 09:51:34 -05:00
- Removed 01_setup module (archived to archive/setup_module) - Renumbered all modules: tensor is now 01, activations is 02, etc. - Added tito setup command for environment setup and package installation - Added numeric shortcuts: tito 01, tito 02, etc. for quick module access - Fixed view command to find dev files correctly - Updated module dependencies and references - Improved user experience: immediate ML learning instead of boring setup
41 lines
1.5 KiB
YAML
41 lines
1.5 KiB
YAML
assessment:
|
|
- Understand why naive loops have poor cache performance
|
|
- Implement cache-friendly blocked matrix multiplication showing 10-50x speedups
|
|
- Recognize why NumPy provides 100x+ speedups over custom implementations
|
|
- Build backend system that automatically chooses optimal implementations
|
|
- 'Apply the ''free speedup'' principle: use better tools, don''t write faster code'
|
|
description: 'Master the easiest optimization: using better backends! Learn why naive
|
|
loops are slow, how cache-friendly blocking helps, and why NumPy provides 100x+
|
|
speedups.'
|
|
difficulty: Advanced
|
|
estimated_time: 3-4 hours
|
|
exports:
|
|
- matmul_naive
|
|
- matmul_blocked
|
|
- matmul_numpy
|
|
- OptimizedBackend
|
|
- matmul
|
|
- set_backend
|
|
learning_objectives:
|
|
- Understand CPU cache hierarchy and memory access performance bottlenecks
|
|
- Implement cache-friendly blocked matrix multiplication algorithms
|
|
- Build vectorized operations with optimized memory access patterns
|
|
- Design transparent backend systems for automatic optimization selection
|
|
- Measure and quantify real performance improvements scientifically
|
|
- Apply systems thinking to optimization decisions in ML workflows
|
|
name: acceleration
|
|
prerequisites:
|
|
- 'Module 2: Tensor operations and NumPy fundamentals'
|
|
- 'Module 4: Linear layers and matrix multiplication'
|
|
- Understanding of basic algorithmic complexity (O notation)
|
|
tags:
|
|
- performance
|
|
- optimization
|
|
- systems
|
|
- hardware
|
|
- acceleration
|
|
- cache
|
|
- vectorization
|
|
- backends
|
|
title: Hardware Acceleration - The Simplest Optimization
|