[PR #3103] [MERGED] Improve SEO/AEO discovery surface for awesome-python.com #16038

Closed
opened 2026-05-02 08:18:51 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/vinta/awesome-python/pull/3103
Author: @vinta
Created: 5/1/2026
Status: Merged
Merged: 5/1/2026
Merged by: @vinta

Base: masterHead: feature/better-seo


📝 Commits (6)

  • f99b8ba update gitignore
  • 2e79365 feat: tighten homepage metadata
  • 1b962cb fix: trim generated HTML whitespace
  • 545f460 feat(website): add discovery files and markdown alternate
  • f8cde59 feat(website): add sitemap lastmod
  • d988f63 feat(seo): add Content-Signal directive to robots.txt

📊 Changes

5 files changed (+259 additions, -26 deletions)

View changed files

📝 .gitignore (+6 -6)
📝 README.md (+1 -1)
📝 website/build.py (+67 -2)
📝 website/templates/base.html (+18 -17)
📝 website/tests/test_build.py (+167 -0)

📄 Description

Summary

Adds the SEO/AEO foundation for awesome-python.com so the site is easier for traditional search engines and AI answer engines to discover, crawl, and represent accurately. All artifacts are generated by website/build.py so nothing in website/output/ is hand-edited.

  • Homepage metadata (website/templates/base.html, website/build.py): single intent-rich <title>, generated meta description with category/entry counts, canonical URL, aligned Open Graph + Twitter tags, summary_large_image card.
  • Root discovery files (website/build.py): generated robots.txt and sitemap.xml. sitemap.xml is valid XML with absolute canonical URLs and a <lastmod> date that reflects each build.
  • Markdown alternate (website/build.py, website/templates/base.html): generated /index.md and /llms.txt (README-derived, with the top-level Sponsors section stripped) plus a <link rel=\"alternate\" type=\"text/markdown\"> on the homepage. index.md is intentionally not added to the sitemap because it is an alternate representation, not a separate canonical page.
  • AI crawler policy (website/build.py): robots.txt now includes Content-Signal: search=yes, ai-input=yes, ai-train=yes inside the User-agent: * group. Open policy that allows search indexing, AI grounding/RAG, and model training, consistent with the existing MIT/CC license posture.
  • Misc: trim generated HTML whitespace; tighten homepage tagline in README.md; ignore docs/ and .agents/ planning artifacts.

Output verified locally:

website/output/robots.txt
website/output/sitemap.xml
website/output/index.html
website/output/index.md
website/output/llms.txt

Test plan

  • make build succeeds
  • uv run pytest website/tests/test_build.py -v (39 passed)
  • Inspected website/output/robots.txt, sitemap.xml, index.html, index.md, llms.txt
  • Post-deploy: curl -I https://awesome-python.com/robots.txt, .../sitemap.xml, .../llms.txt return 200 with expected content types
  • Re-run https://isitagentready.com/awesome-python.com after deploy

🤖 Generated with Claude Code


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/vinta/awesome-python/pull/3103 **Author:** [@vinta](https://github.com/vinta) **Created:** 5/1/2026 **Status:** ✅ Merged **Merged:** 5/1/2026 **Merged by:** [@vinta](https://github.com/vinta) **Base:** `master` ← **Head:** `feature/better-seo` --- ### 📝 Commits (6) - [`f99b8ba`](https://github.com/vinta/awesome-python/commit/f99b8ba8c4305142d76171aca31909895715c2ff) update gitignore - [`2e79365`](https://github.com/vinta/awesome-python/commit/2e793650c8f95a61f22dc998a877234229f841d2) feat: tighten homepage metadata - [`1b962cb`](https://github.com/vinta/awesome-python/commit/1b962cb398395271354f7fbab90eea5a4c58c154) fix: trim generated HTML whitespace - [`545f460`](https://github.com/vinta/awesome-python/commit/545f460477ac86c8d3f4ba181323a7bc5c9d6682) feat(website): add discovery files and markdown alternate - [`f8cde59`](https://github.com/vinta/awesome-python/commit/f8cde59ee570acf28f88bc4199b030d5b63ca5f2) feat(website): add sitemap lastmod - [`d988f63`](https://github.com/vinta/awesome-python/commit/d988f6381fdbc4e09e5316e3e6939a840d8b313e) feat(seo): add Content-Signal directive to robots.txt ### 📊 Changes **5 files changed** (+259 additions, -26 deletions) <details> <summary>View changed files</summary> 📝 `.gitignore` (+6 -6) 📝 `README.md` (+1 -1) 📝 `website/build.py` (+67 -2) 📝 `website/templates/base.html` (+18 -17) 📝 `website/tests/test_build.py` (+167 -0) </details> ### 📄 Description ## Summary Adds the SEO/AEO foundation for `awesome-python.com` so the site is easier for traditional search engines and AI answer engines to discover, crawl, and represent accurately. All artifacts are generated by `website/build.py` so nothing in `website/output/` is hand-edited. - **Homepage metadata** (`website/templates/base.html`, `website/build.py`): single intent-rich `<title>`, generated meta description with category/entry counts, canonical URL, aligned Open Graph + Twitter tags, `summary_large_image` card. - **Root discovery files** (`website/build.py`): generated `robots.txt` and `sitemap.xml`. `sitemap.xml` is valid XML with absolute canonical URLs and a `<lastmod>` date that reflects each build. - **Markdown alternate** (`website/build.py`, `website/templates/base.html`): generated `/index.md` and `/llms.txt` (README-derived, with the top-level Sponsors section stripped) plus a `<link rel=\"alternate\" type=\"text/markdown\">` on the homepage. `index.md` is intentionally not added to the sitemap because it is an alternate representation, not a separate canonical page. - **AI crawler policy** (`website/build.py`): `robots.txt` now includes `Content-Signal: search=yes, ai-input=yes, ai-train=yes` inside the `User-agent: *` group. Open policy that allows search indexing, AI grounding/RAG, and model training, consistent with the existing MIT/CC license posture. - **Misc**: trim generated HTML whitespace; tighten homepage tagline in `README.md`; ignore `docs/` and `.agents/` planning artifacts. Output verified locally: ``` website/output/robots.txt website/output/sitemap.xml website/output/index.html website/output/index.md website/output/llms.txt ``` ## Test plan - [x] `make build` succeeds - [x] `uv run pytest website/tests/test_build.py -v` (39 passed) - [x] Inspected `website/output/robots.txt`, `sitemap.xml`, `index.html`, `index.md`, `llms.txt` - [ ] Post-deploy: `curl -I https://awesome-python.com/robots.txt`, `.../sitemap.xml`, `.../llms.txt` return 200 with expected content types - [ ] Re-run https://isitagentready.com/awesome-python.com after deploy 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-02 08:18:51 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/awesome-python#16038