[PR #977] [CLOSED] Adding weboob in the Web Crawling section #882

New Issue

GiteaMirror · 2025-11-06T13:04:36-06:00

GiteaMirror commented

2025-11-06 13:04:36 -06:00

📋 Pull Request Information

Original PR: https://github.com/vinta/awesome-python/pull/977
Author: @Mistress-Anna
Created: 11/15/2017
Status: ❌ Closed

Base: master ← Head: patch-1

📝 Commits (1)

75fc90d Adding weboob in the Web Crawling section

📊 Changes

1 file changed (+1 additions, -0 deletions)

View changed files

📝 README.md (+1 -0)

📄 Description

What is this Python project?

WebOOB is a framework for scraping websites and aggregating data from multiple websites.

What's the difference between this Python project and similar ones?

Routing model of URL patterns to multiple class of Page with all the parsing associated to each of those Pages, for cleaner code
Scraping is made easy thanks to "declarative parsing": each Page can have a few XPaths, configure a few "filters" to apply on those XPaths (like parsing int, apply regex, etc.), and you're set!
Like every high-level feature in WebOOB, this declarative parsing can be disabled locally, when it doesn't fit for a particular site, and it's always possible to fallback to plain-old procedural parsing code
Pagination handling, supports infinite iterators
Typed data models to ensure clean scraped data
Can handle HTML/XML, JSON, and even XLS or PDF
(Optional) Can aggregate data from multiple websites by grouping them in categories (for example "video sites", "banking sites", "public transport sites", "event sites", etc.)
Comes builtin with a ~250 pre-existing website crawling backends
Has a few graphical and command-line apps to explore and search the scraped data

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/vinta/awesome-python/pull/977 **Author:** [@Mistress-Anna](https://github.com/Mistress-Anna) **Created:** 11/15/2017 **Status:** ❌ Closed **Base:** `master` ← **Head:** `patch-1` --- ### 📝 Commits (1) - [`75fc90d`](https://github.com/vinta/awesome-python/commit/75fc90de51dc14c912b20815cfbd5a0283ac9bfb) Adding weboob in the Web Crawling section ### 📊 Changes **1 file changed** (+1 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `README.md` (+1 -0) </details> ### 📄 Description ## What is this Python project? WebOOB is a framework for scraping websites and aggregating data from multiple websites. ## What's the difference between this Python project and similar ones? * Routing model of URL patterns to multiple class of Page with all the parsing associated to each of those Pages, for cleaner code * Scraping is made easy thanks to "declarative parsing": each Page can have a few XPaths, configure a few "filters" to apply on those XPaths (like parsing int, apply regex, etc.), and you're set! * Like every high-level feature in WebOOB, this declarative parsing can be disabled locally, when it doesn't fit for a particular site, and it's always possible to fallback to plain-old procedural parsing code * Pagination handling, supports infinite iterators * Typed data models to ensure clean scraped data * Can handle HTML/XML, JSON, and even XLS or PDF * (Optional) Can aggregate data from multiple websites by grouping them in categories (for example "video sites", "banking sites", "public transport sites", "event sites", etc.) * Comes builtin with a ~250 pre-existing website crawling backends * Has a few graphical and command-line apps to explore and search the scraped data --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>

GiteaMirror added the pull-request label 2025-11-06 13:04:36 -06:00

GiteaMirror closed this issue

2025-11-06 13:04:36 -06:00

GiteaMirror referenced this issue

2026-04-15 09:10:46 -05:00

[PR #882] [CLOSED] add pip-upgrader in Package Management #2896

GiteaMirror referenced this issue

2026-04-17 06:50:28 -05:00

[PR #882] [CLOSED] add pip-upgrader in Package Management #5203

GiteaMirror referenced this issue

2026-04-18 22:04:03 -05:00

[PR #882] [CLOSED] add pip-upgrader in Package Management #7513