[PR #1147] [CLOSED] Add a python library to extract data from pdf invoices #1029

New Issue

GiteaMirror · 2025-11-06T13:07:46-06:00

GiteaMirror commented

2025-11-06 13:07:46 -06:00

📋 Pull Request Information

Original PR: https://github.com/vinta/awesome-python/pull/1147
Author: @duskybomb
Created: 10/6/2018
Status: ❌ Closed

Base: master ← Head: patch-1

📝 Commits (1)

6cefb30 Update README.md

📊 Changes

1 file changed (+1 additions, -0 deletions)

View changed files

📝 README.md (+1 -0)

📄 Description

Added invoice2data (A modular Python library to extract data from PDF invoices)

What is this Python project?

A modular Python library to support your accounting process.

extracts text from PDF files using different techniques, like pdftotext, pdfminer or tesseract OCR.
searches for regex in the result using a YAML-based template system
saves results as CSV, JSON or XML or renames PDF files to match the content.

What's the difference between this Python project and similar ones?

I haven't come across any such project in python

--

Anyone who agrees with this pull request could vote for it by adding a 👍 to it, and usually, the maintainer will merge it when votes reach 20.

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/vinta/awesome-python/pull/1147 **Author:** [@duskybomb](https://github.com/duskybomb) **Created:** 10/6/2018 **Status:** ❌ Closed **Base:** `master` ← **Head:** `patch-1` --- ### 📝 Commits (1) - [`6cefb30`](https://github.com/vinta/awesome-python/commit/6cefb30183ac60555d7b18286e197d88e98868eb) Update README.md ### 📊 Changes **1 file changed** (+1 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `README.md` (+1 -0) </details> ### 📄 Description Added invoice2data (A modular Python library to extract data from PDF invoices) ## What is this Python project? A modular Python library to support your accounting process. - extracts text from PDF files using different techniques, like pdftotext, pdfminer or tesseract OCR. - searches for regex in the result using a YAML-based template system - saves results as CSV, JSON or XML or renames PDF files to match the content. ## What's the difference between this Python project and similar ones? I haven't come across any such project in python -- Anyone who agrees with this pull request could vote for it by adding a :+1: to it, and usually, the maintainer will merge it when votes reach **20**. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>

GiteaMirror added the pull-request label 2025-11-06 13:07:46 -06:00

GiteaMirror closed this issue

2025-11-06 13:07:46 -06:00

GiteaMirror referenced this issue

2026-04-15 09:13:52 -05:00

[PR #1029] [CLOSED] Add Using Python to Websites section #3033

GiteaMirror referenced this issue

2026-04-17 06:53:59 -05:00

[PR #1029] [CLOSED] Add Using Python to Websites section #5340

GiteaMirror referenced this issue

2026-04-18 22:08:25 -05:00

[PR #1029] [CLOSED] Add Using Python to Websites section #7650