mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-09 00:53:10 -05:00
- Added KernelOptimizationProfiler class with CUDA performance analysis - Implemented memory coalescing and warp divergence analysis - Added tensor core utilization and kernel fusion detection - Included multi-GPU scaling patterns and optimization - Added comprehensive ML systems thinking questions