mirror of
https://github.com/reconurge/flowsint.git
synced 2026-05-07 04:09:49 -05:00
[GH-ISSUE #90] [Request] Extract More From A Webpage #1033
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @zero77 on GitHub (Dec 3, 2025).
Original GitHub issue: https://github.com/reconurge/flowsint/issues/90
The current extract text from webpage option is quite limited and doesn't provide much useful text.
Could it be improved to extract more useful text.
iocsearcher could be used as it is very good at extracting useful text from raw HTML and documents
Example command:
pip install iocsearcher@dextmorgn commented on GitHub (Dec 3, 2025):
hey @zero77,
Thanks for this suggestion. I've added some more infos to extract : title, description, status_code, content, headers, technologies. And these can be extracted by
domain_to_website.iocsearcher is a good suggestion for the future.
@zero77 commented on GitHub (Dec 3, 2025):
domain_to_website is a good option, but it extracts a lot and can be slow.
Can there be an option to exclude things from being extracted.