mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-04-29 17:20:21 -05:00
- Modified publish-live workflow to download PDF from artifacts and upload to release assets - Updated quarto-build deployment to copy PDF to assets but exclude from git commits - Added PDF exclusion rules to .gitignore - Removed PDF commit steps from publish.sh and binder scripts - Created test script to verify PDF handling - Added comprehensive documentation for the new workflow This ensures PDF is available for download but not tracked in git repository, keeping the repo clean while maintaining accessibility.
5.0 KiB
5.0 KiB
🚀 Publish Live Workflow
Overview
The publish-live workflow is designed to handle the publication of your Machine Learning Systems textbook with proper PDF management. The key innovation is that the PDF is available for download but NOT committed to git.
🔄 Workflow Steps
1. Manual Trigger
- Go to GitHub Actions → "🚀 Publish Live"
- Fill in the required fields:
- Description: What you're publishing
- Release Type: patch/minor/major
- Dev Commit: Specific commit to publish
- Confirm: Type "PUBLISH" to confirm
2. Validation & Merge
- ✅ Validates the dev commit exists and is from dev branch
- ✅ Calculates next version number
- ✅ Merges dev → main branch
- ✅ Creates release tag
- ✅ Pushes to main (triggers production build)
3. Production Build
- 🔄 Triggers the "🎮 Controller" workflow
- 📚 Builds HTML and PDF versions
- 📄 Compresses PDF with Ghostscript
- 📦 Uploads build artifacts (including PDF)
4. PDF Handling (NEW)
- 📄 Downloads PDF from build artifacts
- 📦 Creates GitHub Release
- 📄 Uploads PDF to Release Assets
- ✅ PDF is available for download but NOT in git
📄 PDF Management Strategy
What Changed
- Before: PDF was committed to git and pushed to gh-pages
- After: PDF is uploaded to GitHub Release assets only
Benefits
- ✅ No Large Files in Git: PDF is not tracked in repository
- ✅ Faster Clones: Repository stays small
- ✅ Version Control: Each release has its own PDF
- ✅ Download Access: PDF still available for students
- ✅ Clean History: Git history doesn't bloat with PDF changes
PDF Access Points
- Direct Download:
https://mlsysbook.ai/assets/Machine-Learning-Systems.pdf - Release Assets:
https://github.com/harvard-edge/cs249r_book/releases/download/vX.Y.Z/Machine-Learning-Systems.pdf - GitHub Pages: Available in the deployed site
🔧 Technical Implementation
Modified Files
.github/workflows/publish-live.yml: Added PDF download and upload steps.github/workflows/quarto-build.yml: Modified deployment to exclude PDF from git.gitignore: Added PDF exclusion rulestools/scripts/quarto_publish/publish.sh: Removed PDF commitbinder: Removed PDF commit
Key Changes
- PDF Download Step: Downloads PDF from build artifacts
- Release Asset Upload: Uploads PDF to GitHub Release
- Git Exclusion: PDF is copied to assets but not committed
- Artifact Management: PDF is preserved in build artifacts
🧪 Testing
Use the test script to verify PDF handling:
python tools/scripts/test_publish_live.py
This will check:
- ✅ PDF exists in assets/
- ✅ PDF is NOT tracked by git
- ✅ PDF is in .gitignore
- ✅ Git status is clean
📋 Usage Instructions
For Regular Publishing
- Develop on dev branch: Make your changes
- Test thoroughly: Ensure everything works
- Trigger publish-live: Use GitHub Actions UI
- Monitor progress: Watch the workflow run
- Verify deployment: Check the live site
For Emergency Fixes
- Create hotfix branch:
git checkout -b hotfix/description - Make minimal changes: Fix the critical issue
- Merge to dev:
git checkout dev && git merge hotfix/description - Publish immediately: Use publish-live workflow
- Clean up: Delete hotfix branch
🔍 Troubleshooting
Common Issues
PDF Not Found in Artifacts
- Check that the build completed successfully
- Verify PDF was built in the quarto-build workflow
- Check artifact retention settings
Release Creation Fails
- Verify GitHub token permissions
- Check release tag doesn't already exist
- Ensure proper JSON formatting in release notes
PDF Upload Fails
- Check file size limits (GitHub has 2GB limit)
- Verify PDF file exists and is valid
- Check network connectivity
Debug Commands
# Check PDF status
python tools/scripts/test_publish_live.py
# Check git status
git status
# Check if PDF is tracked
git ls-files assets/Machine-Learning-Systems.pdf
# Check .gitignore
grep -n "Machine-Learning-Systems.pdf" .gitignore
📊 Monitoring
Workflow Status
- GitHub Actions: Monitor workflow progress
- Release Page: Check release creation
- Live Site: Verify deployment
- PDF Access: Test download links
Success Indicators
- ✅ Workflow completes without errors
- ✅ Release is created with PDF asset
- ✅ Live site is updated
- ✅ PDF is accessible for download
- ✅ Git repository remains clean (no PDF commits)
🎯 Best Practices
- Always test on dev first: Never publish directly from main
- Use descriptive commit messages: Helps with release notes
- Monitor build times: PDF generation can take 15-30 minutes
- Check artifact retention: Ensure PDFs are preserved
- Verify deployment: Always test the live site after publishing