[PR #987] [CLOSED] added newspaper to web crawling #892

Closed
opened 2025-11-06 13:04:49 -06:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/vinta/awesome-python/pull/987
Author: @jjanczyszyn
Created: 12/13/2017
Status: Closed

Base: masterHead: web-crawling


📝 Commits (1)

  • 2865180 added newspaper to web crawling

📊 Changes

1 file changed (+1 additions, -0 deletions)

View changed files

📝 README.md (+1 -0)

📄 Description

What is this Python project?

A library that makes it easy to crawl for and scrape articles.

What's the difference between this Python project and similar ones?

  • It's extremely easy to use.
  • Multi-threaded article download framework
  • News url identification
  • Text extraction from html
  • Top image extraction from html
  • All image extraction from html
  • Keyword extraction from text
  • Summary extraction from text
  • Author extraction from text
  • Google trending terms extraction
  • Works in 10+ languages (English, Chinese, German, Arabic, ...)

Anyone who agrees with this pull request could vote for it by adding a 👍 to it, and usually, the maintainer will merge it when votes reach 20.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/vinta/awesome-python/pull/987 **Author:** [@jjanczyszyn](https://github.com/jjanczyszyn) **Created:** 12/13/2017 **Status:** ❌ Closed **Base:** `master` ← **Head:** `web-crawling` --- ### 📝 Commits (1) - [`2865180`](https://github.com/vinta/awesome-python/commit/2865180dc930880ffe473ad80c7be388b14b3b62) added newspaper to web crawling ### 📊 Changes **1 file changed** (+1 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `README.md` (+1 -0) </details> ### 📄 Description ## What is this Python project? A library that makes it easy to crawl for and scrape articles. ## What's the difference between this Python project and similar ones? - It's extremely easy to use. - Multi-threaded article download framework - News url identification - Text extraction from html - Top image extraction from html - All image extraction from html - Keyword extraction from text - Summary extraction from text - Author extraction from text - Google trending terms extraction - Works in 10+ languages (English, Chinese, German, Arabic, ...) -- Anyone who agrees with this pull request could vote for it by adding a :+1: to it, and usually, the maintainer will merge it when votes reach **20**. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2025-11-06 13:04:49 -06:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/awesome-python#892