mirror of
https://github.com/reconurge/flowsint.git
synced 2026-06-10 18:45:50 -05:00
[PR #151] feat(enrichers): Arabic media enrichers (Sabq, Argaam, Al Arabiya, Nitter) #2621
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/reconurge/flowsint/pull/151
Author: @SocialMDev
Created: 6/1/2026
Status: 🔄 Open
Base:
main← Head:feat/arabic-osint-enrichers📝 Commits (1)
adbcef2feat(enrichers): add Arabic media enrichers (Sabq, Argaam, Al Arabiya, Nitter)📊 Changes
20 files changed (+1732 additions, -0 deletions)
View changed files
📝
flowsint-enrichers/pyproject.toml(+1 -0)➕
flowsint-enrichers/src/flowsint_enrichers/individual/to_alarabiya.py(+134 -0)➕
flowsint-enrichers/src/flowsint_enrichers/individual/to_arabic_tweets.py(+136 -0)➕
flowsint-enrichers/src/flowsint_enrichers/individual/to_argaam.py(+134 -0)➕
flowsint-enrichers/src/flowsint_enrichers/individual/to_sabq.py(+134 -0)➕
flowsint-enrichers/src/flowsint_enrichers/phrase/__init__.py(+0 -0)➕
flowsint-enrichers/src/flowsint_enrichers/phrase/to_alarabiya.py(+127 -0)➕
flowsint-enrichers/src/flowsint_enrichers/phrase/to_arabic_tweets.py(+127 -0)➕
flowsint-enrichers/src/flowsint_enrichers/phrase/to_argaam.py(+127 -0)➕
flowsint-enrichers/src/flowsint_enrichers/phrase/to_sabq.py(+127 -0)➕
flowsint-enrichers/src/tools/arabic_media/__init__.py(+0 -0)➕
flowsint-enrichers/src/tools/arabic_media/alarabiya.py(+70 -0)➕
flowsint-enrichers/src/tools/arabic_media/argaam.py(+66 -0)➕
flowsint-enrichers/src/tools/arabic_media/nitter.py(+125 -0)➕
flowsint-enrichers/src/tools/arabic_media/sabq.py(+62 -0)➕
flowsint-enrichers/tests/enrichers/test_arabic_alarabiya.py(+70 -0)➕
flowsint-enrichers/tests/enrichers/test_arabic_argaam.py(+73 -0)➕
flowsint-enrichers/tests/enrichers/test_arabic_sabq.py(+133 -0)➕
flowsint-enrichers/tests/enrichers/test_arabic_tweets.py(+75 -0)📝
uv.lock(+11 -0)📄 Description
Summary
Adds 8 enrichers that surface Arabic-language mentions of an
Individualor aPhrase(topic) and link them into the graph asWebsitenodes with source-specific relationship labels.individual_to_sabq/phrase_to_sabqMENTIONED_IN_SABQindividual_to_argaam/phrase_to_argaamMENTIONED_IN_ARGAAMindividual_to_alarabiya/phrase_to_alarabiyaMENTIONED_IN_ALARABIYAsite:alarabiya.net)individual_to_arabic_tweets/phrase_to_arabic_tweetsMENTIONED_ON_TWITTER_ARWhat changed
Why a new
phrase/categorySabq / Argaam / Al Arabiya all support searching for topics, not just people.
Phrasewas already inflowsint-typesbut had no enrichers — this PR adds the first set. Topic search is useful for journalists / OSINT investigators tracking issues rather than individuals.Security notes
AlArabiyaToolfor parsing Google News RSS, to avoid XXE / billion-laughs attacks on untrusted XML. Added as a dependency (>=0.7,<0.8).NITTER_INSTANCESthen falls back to a Google dork; tests cover both paths via mocking.Demo
Brought up
docker-compose.dev.ymlinfra (postgres + redis + neo4j) and ranindividual_to_sabqagainst real Neo4j with HTTP mocked to return 3 fixture article hits for "Faisal Aldeghaither":The Neo4j Browser visualisation showing the Individual → 3 Website subgraph with
MENTIONED_IN_SABQedges and Arabic article titles is attached in the first PR comment.Test plan
pytest tests/enrichers/test_arabic_*.py— 19 new tests, all greentests/enrichers/test_registry.pystill passes (21/21 with new suite)@flowsint_enricherand appear inENRICHER_REGISTRYSabqTool,ArgaamTool,AlArabiyaTool,NitterArabicToolare mockedNotes for maintainer
to_domains.pyreference enricher served as the architectural template (preprocess / scan / postprocess split,@flowsint_enricherdecorator, module-levelInputType/OutputTypere-export). I tried to match style and structure exactly; happy to adjust if you want different conventions for the newphrase/category.sabq.organdargaam.comare defensive (multiple fallbacks via comma-separated CSS selectors) but will need maintenance if those sites change their markup.🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.