mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-06-01 19:07:04 -05:00
- Added KernelOptimizationProfiler class with CUDA performance analysis - Implemented memory coalescing and warp divergence analysis - Added tensor core utilization and kernel fusion detection - Included multi-GPU scaling patterns and optimization - Added comprehensive ML systems thinking questions