[PR #2581] Add ArchiveBox to Web Content Extracting section #1898

Open
opened 2025-11-06 13:25:25 -06:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/vinta/awesome-python/pull/2581
Author: @pirate
Created: 5/4/2024
Status: 🔄 Open

Base: masterHead: patch-1


📝 Commits (1)

  • 1fa73e3 Add ArchiveBox to Web Content Extracting section

📊 Changes

1 file changed (+1 additions, -0 deletions)

View changed files

📝 README.md (+1 -0)

📄 Description

What is this Python project?

Internet archiving / web content extraction tool, supports extracting these content types and more:

  • raw html, html after JS executes in chrome headless
  • screenshot & PDF
  • embedded audio, video, subtitles (using yt-dlp)
  • article text and comments
  • git repositories
  • and lots more...

What's the difference between this Python project and similar ones?

See here:

--

Anyone who agrees with this pull request could submit an Approve review to it.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/vinta/awesome-python/pull/2581 **Author:** [@pirate](https://github.com/pirate) **Created:** 5/4/2024 **Status:** 🔄 Open **Base:** `master` ← **Head:** `patch-1` --- ### 📝 Commits (1) - [`1fa73e3`](https://github.com/vinta/awesome-python/commit/1fa73e3ef7b1c41626e1606b0dc6ae67f9bfb179) Add ArchiveBox to Web Content Extracting section ### 📊 Changes **1 file changed** (+1 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `README.md` (+1 -0) </details> ### 📄 Description ## What is this Python project? Internet archiving / web content extraction tool, supports extracting these content types and more: - raw html, html after JS executes in chrome headless - screenshot & PDF - embedded audio, video, subtitles (using `yt-dlp`) - article text and comments - git repositories - [and lots more...](https://github.com/ArchiveBox/ArchiveBox#output-formats) ## What's the difference between this Python project and similar ones? See here: - https://github.com/ArchiveBox/ArchiveBox#comparison-to-other-projects - https://github.com/ArchiveBox/ArchiveBox/wiki/Web-Archiving-Community#other-archivebox-alternatives -- Anyone who agrees with this pull request could submit an *Approve* review to it. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2025-11-06 13:25:25 -06:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/awesome-python#1898