[GH-ISSUE #132] [DRAFT] RFC: Schema.org as the Core Semantic Type System for Flowsint #93

Open
opened 2026-04-11 08:42:25 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @gustavorps on GitHub (Mar 16, 2026).
Original GitHub issue: https://github.com/reconurge/flowsint/issues/132

RFC Number: 002
Title:      Schema.org as the Core Semantic Type System for Flowsint
Author:     Gustavo RPS <github://username/gustavorps>
Status:     Draft
Created:    2026-03-16
Repository: https://github.com/reconurge/flowsint

Abstract

This RFC proposes adopting Schema.org as the foundational semantic vocabulary for Flowsint's type system (flowsint-types). By grounding every entity — Person, Organization, Domain, Email, Phone, WebSite, and others — in Schema.org's well-established, machine-readable vocabulary, Flowsint gains a common language that is interoperable with the broader web, unambiguous for contributors, and extensible without breaking existing contracts. This document explains the motivation, defines a mapping strategy from current Pydantic models to Schema.org types, proposes a concrete implementation path, and addresses known limitations and open questions.

1. Motivation and Problem Statement

Flowsint currently defines its entity types in flowsint-types as standalone Pydantic models — Domain, IP, ASN, CIDR, Individual, Organization, Email, Phone, Website, SocialProfile, Credential, CryptoWallet, Transaction, NFT, and more. These models are purpose-built and work well within the tool's existing scope, but they carry several structural limitations as the project grows:

1.1 Semantic ambiguity. There is no canonical definition of what "Individual" means in relation to "Organization", or how a "Website" differs from a "Domain". New contributors must read source code to understand entity semantics rather than relying on a shared vocabulary. This friction slows onboarding and increases the likelihood of mismatches between enrichers.

1.2 No cross-tool interoperability. OSINT workflows increasingly span multiple tools — Maltego, SpiderFoot, OpenCTI, and others. Flowsint data exported or exchanged with these tools requires ad-hoc translation layers because there is no common ontological footing. A shared vocabulary would let Flowsint speak the same language as any tool that also maps to Schema.org.

1.3 Limited discoverability and graph reasoning. Flowsint uses Neo4j as its graph backend. Without typed, semantically named relationships and entities, graph queries are brittle and tooling like graph-based knowledge reasoning cannot leverage type hierarchies (e.g., knowing that schema:Person is a subtype of schema:Thing lets a query engine traverse more intelligently).

1.4 Reinventing solved problems. Schema.org is a W3C-endorsed, community-maintained vocabulary used by billions of web pages and a growing number of knowledge graphs. It already defines types for Person, Organization, WebSite, ContactPoint, PostalAddress, and more that directly overlap with Flowsint's entity set. Maintaining parallel definitions duplicates work that the Schema.org community handles.

2. Proposal Summary

This RFC proposes the following:

  1. Adopt Schema.org URIs as the canonical @type for every Flowsint entity. Each Pydantic model in flowsint-types will declare a schema_type class variable pointing to its Schema.org equivalent (e.g., "https://schema.org/Person").

  2. Extend Schema.org where no suitable type exists. Types with no Schema.org equivalent (e.g., ASN, CIDR, CryptoWallet) will be defined as Flowsint-specific extensions under a custom namespace (https://schema.flowsint.io/) following Schema.org's own extension convention.

  3. Emit JSON-LD context by default from the Flowsint API so that every entity response is semantically typed and directly consumable by any JSON-LD-aware tool.

  4. Store @type as a node label in Neo4j so that graph traversals benefit from semantic type hierarchies.

  5. Keep Pydantic models as the internal contract — this is a non-breaking, additive change. Schema.org alignment is expressed through metadata and serialization, not by replacing the model layer.

3. Schema.org Background

Schema.org is a collaborative, community-driven project founded by Google, Microsoft, Yahoo, and Yandex in 2011 and now maintained under the W3C Schema.org Community Group. It defines a hierarchy of types rooted at schema:Thing, with property definitions that express relationships between types.

Key properties of Schema.org relevant to this proposal:

  • Open vocabulary: All types and properties are publicly defined and versioned at https://schema.org.
  • Extensible: The extension mechanism allows third-party namespaces to add types and properties that inherit from core Schema.org types.
  • JSON-LD native: Schema.org is the primary vocabulary used in JSON-LD, the W3C standard for linked data.
  • Broad adoption: Used by billions of web pages for structured data markup and by major knowledge graphs including Wikidata and Google's Knowledge Graph.

The current Schema.org release is v29.4 (2025-12-08).

4. Type Mapping

The following table maps existing Flowsint types to their Schema.org equivalents. Where no direct equivalent exists, a Flowsint extension type is proposed.

4.1 Core Entity Mappings

Flowsint Type Schema.org Type Notes
Individual schema:Person Direct mapping. Schema.org Person covers name, identifier, affiliation, email, telephone.
Organization schema:Organization Direct mapping. Covers legal name, URL, address, founding date, members.
Email schema:ContactPoint (contactType: email) Schema.org models email as a property (schema:email) or a ContactPoint. Flowsint's richer Email entity maps cleanly to ContactPoint with contactType = "email".
Phone schema:ContactPoint (contactType: phone) Same pattern as Email.
Website schema:WebSite Direct mapping. Covers url, name, description. Sub-pages are schema:WebPage.
Domain schema:WebSite + Flowsint extension Schema.org has no dedicated Domain type. Use schema:WebSite for the root representation and flowsint:Domain as a refinement carrying DNS-specific properties.
SocialProfile schema:ProfilePage Schema.org 13+ includes ProfilePage. Maps to social media profile pages directly.
Credential flowsint:Credential (extension) No Schema.org equivalent. Extends schema:Thing.
IP flowsint:IPAddress (extension) No Schema.org equivalent. Extends schema:Thing.
ASN flowsint:AutonomousSystem (extension) No Schema.org equivalent. Extends schema:Thing.
CIDR flowsint:CIDRBlock (extension) No Schema.org equivalent. Extends schema:Thing.
CryptoWallet flowsint:CryptoWallet (extension) Schema.org has no crypto-native types. Could consider schema:MoneyAccount as a distant analogy, but a clean extension is preferred for accuracy.
Transaction schema:MoneyTransfer Reasonable mapping for fiat or on-chain value transfer. Crypto-specific fields (hash, gas, block) would be extension properties.
NFT flowsint:NFT (extension) No Schema.org equivalent. Extends schema:CreativeWork given NFTs often represent digital assets.

4.2 Relationship Mappings

Schema.org also defines relationships (properties) that map to Flowsint's graph edges:

Flowsint Edge Schema.org Property / Flowsint Extension
Individual → Organization schema:memberOf / schema:worksFor
Individual → Email schema:email (via ContactPoint)
Individual → Phone schema:telephone (via ContactPoint)
Organization → Domain schema:url + flowsint:ownsDomain
Domain → IP flowsint:resolvesTo
IP → ASN flowsint:belongsToASN
ASN → CIDR flowsint:announcesPrefix
CryptoWallet → Transaction flowsint:hasTransaction

5. Proposed Implementation

5.1 New Namespace: flowsint.typing.schema_org

Rather than annotating the existing Pydantic models in flowsint-types directly, a dedicated new sub-package is introduced:

flowsint-types/
  flowsint/
    typing/
      schema_org/
        __init__.py          # Public re-exports
        _base.py             # SchemaOrgMixin and JSON-LD helpers
        _context.py          # JSON-LD context constants
        entities/
          __init__.py
          person.py          # Individual → schema:Person
          organization.py    # Organization → schema:Organization
          contact_point.py   # Email, Phone → schema:ContactPoint
          website.py         # Website → schema:WebSite
          social_profile.py  # SocialProfile → schema:ProfilePage
          transaction.py     # Transaction → schema:MoneyTransfer
        extensions/
          __init__.py
          domain.py          # flowsint:Domain
          ip_address.py      # flowsint:IPAddress
          asn.py             # flowsint:AutonomousSystem
          cidr.py            # flowsint:CIDRBlock
          crypto_wallet.py   # flowsint:CryptoWallet
          nft.py             # flowsint:NFT
          credential.py      # flowsint:Credential
        context.jsonld       # Published JSON-LD context

The existing models in flowsint-types are not modified. The schema_org namespace is a parallel layer that wraps or mirrors them, providing Schema.org-aware serialization, validation, and identity. This separation guarantees that enrichers and API routes that depend on the current models continue to function without any changes.

5.1.1 Base Mixin (_base.py)

All Schema.org-typed models share a common mixin that handles JSON-LD serialization and identity:

# flowsint/typing/schema_org/_base.py

from __future__ import annotations
from typing import ClassVar
from pydantic import BaseModel, model_serializer


FLOWSINT_NS = "https://schema.flowsint.io/"
SCHEMA_NS   = "https://schema.org/"
CONTEXT_URL = "https://schema.flowsint.io/context.jsonld"


class SchemaOrgMixin(BaseModel):
    """
    Mixin that adds Schema.org identity and JSON-LD serialization
    to any Flowsint entity model.

    Subclasses must declare:
        schema_type: ClassVar[str]  — the fully-qualified Schema.org or
                                      flowsint: type URI
    """

    schema_type: ClassVar[str]        # e.g. "https://schema.org/Person"
    schema_id:   str | None = None    # Optional stable entity IRI

    def to_jsonld(self) -> dict:
        """Return a JSON-LD representation of this entity."""
        payload = self.model_dump(exclude_none=True, exclude={"schema_id"})
        return {
            "@context": CONTEXT_URL,
            "@type":    self.schema_type,
            "@id":      self.schema_id or f"{FLOWSINT_NS}entities/{id(self)}",
            **payload,
        }

    @classmethod
    def from_flowsint(cls, model) -> "SchemaOrgMixin":
        """
        Construct a Schema.org-typed instance from a raw flowsint-types model.
        Subclasses should override this to apply field mapping.
        """
        return cls(**model.model_dump())

5.1.2 Example Entity (entities/person.py)

# flowsint/typing/schema_org/entities/person.py

from typing import ClassVar
from flowsint.typing.schema_org._base import SchemaOrgMixin, SCHEMA_NS


class Person(SchemaOrgMixin):
    """
    Maps flowsint-types `Individual` to schema:Person.
    https://schema.org/Person
    """

    schema_type: ClassVar[str] = f"{SCHEMA_NS}Person"

    # Schema.org-aligned field names (camelCase aliases kept via Field)
    given_name:   str | None = None   # schema:givenName
    family_name:  str | None = None   # schema:familyName
    email:        str | None = None   # schema:email
    telephone:    str | None = None   # schema:telephone
    affiliation:  str | None = None   # schema:affiliation (org name)
    identifier:   str | None = None   # schema:identifier

    @classmethod
    def from_flowsint(cls, individual) -> "Person":
        return cls(
            given_name  = individual.name,
            email       = individual.email,
            identifier  = str(individual.id) if individual.id else None,
        )

5.1.3 Example Extension (extensions/ip_address.py)

# flowsint/typing/schema_org/extensions/ip_address.py

from typing import ClassVar
from flowsint.typing.schema_org._base import SchemaOrgMixin, FLOWSINT_NS


class IPAddress(SchemaOrgMixin):
    """
    Flowsint extension type: flowsint:IPAddress.
    Extends schema:Thing — no native Schema.org equivalent exists.
    https://schema.flowsint.io/IPAddress
    """

    schema_type: ClassVar[str] = f"{FLOWSINT_NS}IPAddress"

    address:      str                  # The IP address string (v4 or v6)
    version:      str | None = None    # "IPv4" | "IPv6"
    asn:          str | None = None    # Owning ASN (flowsint:AutonomousSystem IRI)
    country_code: str | None = None    # ISO 3166-1 alpha-2
    city:         str | None = None
    latitude:     float | None = None
    longitude:    float | None = None

    @classmethod
    def from_flowsint(cls, ip) -> "IPAddress":
        return cls(
            address      = ip.address,
            version      = ip.version,
            country_code = ip.country,
            city         = ip.city,
            latitude     = ip.lat,
            longitude    = ip.lon,
        )

5.1.4 JSON-LD Context (context.jsonld)

The context file is published at a stable, versioned URL and bundled inside the package at flowsint/typing/schema_org/context.jsonld:

{
  "@context": {
    "@vocab":       "https://schema.org/",
    "schema":       "https://schema.org/",
    "flowsint":     "https://schema.flowsint.io/",

    "Person":        "schema:Person",
    "Organization":  "schema:Organization",
    "ContactPoint":  "schema:ContactPoint",
    "WebSite":       "schema:WebSite",
    "ProfilePage":   "schema:ProfilePage",
    "MoneyTransfer": "schema:MoneyTransfer",

    "IPAddress":       "flowsint:IPAddress",
    "AutonomousSystem":"flowsint:AutonomousSystem",
    "CIDRBlock":       "flowsint:CIDRBlock",
    "Domain":          "flowsint:Domain",
    "CryptoWallet":    "flowsint:CryptoWallet",
    "NFT":             "flowsint:NFT",
    "Credential":      "flowsint:Credential",

    "resolvesTo":      { "@id": "flowsint:resolvesTo",      "@type": "@id" },
    "belongsToASN":    { "@id": "flowsint:belongsToASN",    "@type": "@id" },
    "ownsDomain":      { "@id": "flowsint:ownsDomain",      "@type": "@id" },
    "announcesPrefix": { "@id": "flowsint:announcesPrefix", "@type": "@id" },
    "hasTransaction":  { "@id": "flowsint:hasTransaction",  "@type": "@id" }
  }
}

5.2 API Integration

The flowsint-api layer gains a thin content-negotiation adapter. No existing route signatures change:

# flowsint/api/middleware/schema_org.py

from fastapi import Request, Response
from flowsint.typing.schema_org import to_jsonld

async def schema_org_middleware(request: Request, call_next):
    response = await call_next(request)
    if "application/ld+json" in request.headers.get("Accept", ""):
        # Re-serialize the response body through the schema_org layer
        body = await response.body()
        ld = to_jsonld(body)
        return Response(
            content=ld,
            media_type="application/ld+json",
            status_code=response.status_code,
        )
    return response

Standard application/json responses are completely unaffected.

5.3 PostgreSQL Migration Script

The following Alembic-compatible migration adds a schema_type column to every entity table and backfills it from the type mapping defined in Section 4.1. This column acts as the durable, queryable semantic tag inside the relational store.

# migrations/versions/0010_add_schema_org_type_column.py
"""Add schema_type column to all entity tables

Revision ID: 0010
Revises: 0009
Create Date: 2026-03-16
"""

from alembic import op
import sqlalchemy as sa

# Mapping: table_name → Schema.org / flowsint: URI
SCHEMA_TYPE_MAP = {
    "individuals":    "https://schema.org/Person",
    "organizations":  "https://schema.org/Organization",
    "emails":         "https://schema.org/ContactPoint",
    "phones":         "https://schema.org/ContactPoint",
    "websites":       "https://schema.org/WebSite",
    "social_profiles":"https://schema.org/ProfilePage",
    "transactions":   "https://schema.org/MoneyTransfer",
    "domains":        "https://schema.flowsint.io/Domain",
    "ips":            "https://schema.flowsint.io/IPAddress",
    "asns":           "https://schema.flowsint.io/AutonomousSystem",
    "cidrs":          "https://schema.flowsint.io/CIDRBlock",
    "crypto_wallets": "https://schema.flowsint.io/CryptoWallet",
    "nfts":           "https://schema.flowsint.io/NFT",
    "credentials":    "https://schema.flowsint.io/Credential",
}


def upgrade() -> None:
    for table, schema_type in SCHEMA_TYPE_MAP.items():
        # 1. Add nullable column first (safe for large tables — no full rewrite)
        op.add_column(
            table,
            sa.Column(
                "schema_type",
                sa.Text,
                nullable=True,
                comment="Schema.org or flowsint: type URI for this entity",
            ),
        )
        # 2. Backfill all existing rows
        op.execute(
            f"UPDATE {table} SET schema_type = '{schema_type}' "
            f"WHERE schema_type IS NULL"
        )
        # 3. Enforce NOT NULL now that backfill is complete
        op.alter_column(table, "schema_type", nullable=False)

        # 4. Index for fast type-filtered queries
        op.create_index(
            f"ix_{table}_schema_type",
            table,
            ["schema_type"],
        )


def downgrade() -> None:
    for table in SCHEMA_TYPE_MAP:
        op.drop_index(f"ix_{table}_schema_type", table_name=table)
        op.drop_column(table, "schema_type")

5.4 Neo4j Migration Script

The Cypher migration below runs as a one-off script via the neo4j-admin CLI or through the Neo4j Python driver. It adds a schemaType property and a secondary Schema.org label to every existing node, and creates an index for efficient type-based traversal.

// migrations/neo4j/0010_add_schema_org_labels.cypher
//
// Adds schemaType property and secondary Schema.org labels to all
// existing Flowsint entity nodes.
//
// Run with:
//   cypher-shell -u neo4j -p <password> -f 0010_add_schema_org_labels.cypher
// Or via the Python driver:
//   session.run(open("0010_add_schema_org_labels.cypher").read())

// ── 1. Add schemaType property ─────────────────────────────────────────────

MATCH (n:Individual)
  SET n.schemaType = 'https://schema.org/Person',
      n.schemaLabel = 'Person';

MATCH (n:Organization)
  SET n.schemaType = 'https://schema.org/Organization',
      n.schemaLabel = 'Organization';

MATCH (n:Email)
  SET n.schemaType = 'https://schema.org/ContactPoint',
      n.schemaLabel = 'ContactPoint',
      n.contactType = 'email';

MATCH (n:Phone)
  SET n.schemaType = 'https://schema.org/ContactPoint',
      n.schemaLabel = 'ContactPoint',
      n.contactType = 'telephone';

MATCH (n:Website)
  SET n.schemaType = 'https://schema.org/WebSite',
      n.schemaLabel = 'WebSite';

MATCH (n:SocialProfile)
  SET n.schemaType = 'https://schema.org/ProfilePage',
      n.schemaLabel = 'ProfilePage';

MATCH (n:Transaction)
  SET n.schemaType = 'https://schema.org/MoneyTransfer',
      n.schemaLabel = 'MoneyTransfer';

MATCH (n:Domain)
  SET n.schemaType = 'https://schema.flowsint.io/Domain',
      n.schemaLabel = 'Domain';

MATCH (n:IP)
  SET n.schemaType = 'https://schema.flowsint.io/IPAddress',
      n.schemaLabel = 'IPAddress';

MATCH (n:ASN)
  SET n.schemaType = 'https://schema.flowsint.io/AutonomousSystem',
      n.schemaLabel = 'AutonomousSystem';

MATCH (n:CIDR)
  SET n.schemaType = 'https://schema.flowsint.io/CIDRBlock',
      n.schemaLabel = 'CIDRBlock';

MATCH (n:CryptoWallet)
  SET n.schemaType = 'https://schema.flowsint.io/CryptoWallet',
      n.schemaLabel = 'CryptoWallet';

MATCH (n:NFT)
  SET n.schemaType = 'https://schema.flowsint.io/NFT',
      n.schemaLabel = 'NFT';

MATCH (n:Credential)
  SET n.schemaType = 'https://schema.flowsint.io/Credential',
      n.schemaLabel = 'Credential';

// ── 2. Add secondary Schema.org labels ────────────────────────────────────
//    This enables queries like: MATCH (n:Person) and MATCH (n:Individual)
//    to return the same nodes.

MATCH (n:Individual)   SET n:Person;
MATCH (n:Organization) SET n:Organization;   // already matches
MATCH (n:Email)        SET n:ContactPoint;
MATCH (n:Phone)        SET n:ContactPoint;
MATCH (n:Website)      SET n:WebSite;
MATCH (n:SocialProfile)SET n:ProfilePage;
MATCH (n:Transaction)  SET n:MoneyTransfer;

// ── 3. Create index on schemaType for fast type-filtered lookups ──────────

CREATE INDEX schema_type_index IF NOT EXISTS
  FOR (n:__ALL_LABELS__)     // Neo4j 5+: token lookup index covers all labels
  ON (n.schemaType);

// For Neo4j 4.x, create per-label indexes instead:
// CREATE INDEX ix_individual_schema_type   IF NOT EXISTS FOR (n:Individual)    ON (n.schemaType);
// CREATE INDEX ix_organization_schema_type IF NOT EXISTS FOR (n:Organization)  ON (n.schemaType);
// ... (repeat for each label)

// ── 4. Verify ─────────────────────────────────────────────────────────────

MATCH (n)
WHERE n.schemaType IS NOT NULL
RETURN n.schemaType AS type, count(n) AS count
ORDER BY count DESC;

A Python helper is also provided for running this migration programmatically within the Flowsint startup sequence:

# flowsint/core/migrations/neo4j_schema_org.py

from neo4j import GraphDatabase
from pathlib import Path
import logging

logger = logging.getLogger(__name__)

MIGRATION_FILE = Path(__file__).parent / "cypher" / "0010_add_schema_org_labels.cypher"


def run(driver: GraphDatabase.driver, database: str = "neo4j") -> None:
    """
    Apply the Schema.org label migration to an existing Neo4j database.
    Safe to run multiple times (idempotent via SET and IF NOT EXISTS).
    """
    cypher = MIGRATION_FILE.read_text(encoding="utf-8")
    # Split on blank lines to execute each statement independently
    statements = [s.strip() for s in cypher.split("\n\n") if s.strip()
                  and not s.strip().startswith("//")]

    with driver.session(database=database) as session:
        for stmt in statements:
            logger.debug("Running Neo4j migration statement:\n%s", stmt)
            session.run(stmt)

    logger.info("Neo4j Schema.org migration 0010 applied successfully.")

6. Benefits

Interoperability. Any system that understands Schema.org or JSON-LD can consume Flowsint entities without a custom adapter. This makes it straightforward to pipe Flowsint output into SIEM platforms, knowledge graphs, or other OSINT tools.

Contributor clarity. When a new enricher author needs to add a type, they have a canonical reference to check first rather than inventing a schema in isolation. The question "does this entity already exist?" has a definitive answer.

Richer graph queries. Semantic type labels in Neo4j enable queries like "find all entities that are subtypes of schema:Organization" — a query that would otherwise require maintaining an explicit type hierarchy in application code.

Future-proofing. Schema.org is actively maintained. As the web evolves — and as OSINT increasingly intersects with structured data on the web — Flowsint's type system evolves with it at no extra cost.

SEO and documentation. If Flowsint ever exposes a public API or documentation site, Schema.org types are directly understood by search engines, improving discoverability of API documentation.

7. Drawbacks and Limitations

7.1 Schema.org is not designed for OSINT. Schema.org's primary audience is web publishers marking up content for search engines. Many OSINT-specific concepts (IP geolocation, ASN routing, credential exposure) are outside its scope and must be extensions. This means Flowsint cannot fully delegate to Schema.org — it must maintain its own namespace for a significant portion of its type system.

7.2 Property naming differences. Schema.org uses camelCase property names (givenName, legalName, foundingDate) that may differ from Flowsint's current snake_case Pydantic fields. Mapping between the two requires care to avoid confusion. The to_jsonld() method must handle this translation explicitly.

7.3 Schema.org evolves. A type present in Schema.org v29 may be renamed, deprecated, or restructured in a future release. Flowsint would need a policy for tracking Schema.org releases and updating mappings accordingly. This is manageable but not zero-cost.

7.4 Adds conceptual overhead for contributors. Contributors unfamiliar with Schema.org, JSON-LD, or RDF concepts may find the new machinery confusing. Good documentation and a clear "quick start for adding a new type" guide would mitigate this.

8. Alternatives Considered

8.1 STIX/TAXII. The Structured Threat Information eXpression (STIX) standard is the de facto vocabulary in cyber threat intelligence. It covers many of the same entities (IP, Domain, Email, Person, Organization) in a security-native way. STIX was not chosen as the primary mapping for two reasons: (a) it is significantly more verbose and complex than needed for Flowsint's graph-based exploration model, and (b) Schema.org is more broadly understood outside the security community, which Flowsint's journalist and researcher audience will appreciate. A secondary STIX serialization could be offered in a future RFC.

8.2 Wikidata / Linked Open Data. Wikidata's property and type system is extremely expressive but also extremely granular and requires deep familiarity with Q-codes. It is not appropriate as a primary vocabulary for an early-stage project.

8.3 Custom Flowsint ontology (status quo++). Simply documenting the existing types more thoroughly does not solve the interoperability or semantic ambiguity problems. A custom ontology from scratch would replicate work already done by Schema.org for the overlapping types.

9. Migration Path

This change is intended to be non-breaking and incremental:

Phase Action Breaking?
1 Create flowsint.typing.schema_org namespace and SchemaOrgMixin No
2 Implement all entity and extension models under the new namespace No
3 Publish context.jsonld inside the package and at a stable URL No
4 Run PostgreSQL Alembic migration 0010_add_schema_org_type_column No
5 Run Neo4j Cypher migration 0010_add_schema_org_labels.cypher No
6 Add Accept: application/ld+json content negotiation to the API No
7 (Optional) Rename Pydantic fields in flowsint-types to align with Schema.org property names Yes — semver major

Phases 1–6 can ship together as a minor version since they are entirely additive — existing models, database schemas, and graph queries are untouched. Phase 7 is a separate, explicitly opt-in migration behind a semver major bump and should only be pursued if the community concludes the naming alignment is worth the breaking change.

10. Open Questions

The following questions are raised for community discussion:

  1. Should flowsint: extension types also be submitted upstream to Schema.org? Types like IPAddress and AutonomousSystem are general enough to be useful beyond OSINT. Submitting proposals to the W3C Schema.org Community Group is feasible but requires sustained engagement.

  2. Where should the context.jsonld be hosted? Options include the repository itself (requiring a versioned URL like https://raw.githubusercontent.com/reconurge/flowsint/v1.2.0/schema/context.jsonld), a dedicated domain (https://schema.flowsint.io/), or a GitHub Pages deployment. Each has trade-offs in terms of stability and maintenance.

  3. Should STIX be offered as a secondary serialization format? Given Flowsint's cybersecurity audience, a to_stix() method alongside to_jsonld() could significantly improve interoperability with professional threat intelligence platforms. This is out of scope for this RFC but recommended as a follow-on.

  4. How should CryptoWallet relate to Schema.org's financial types? Schema.org has FinancialProduct, BankAccount, and MoneyAccount. None are a clean fit for a cryptographic wallet. A thorough review of Schema.org's financial extension vocabulary is warranted before finalising the extension definition.

11. Reference Implementation

A reference branch demonstrating the proposed changes across flowsint-types (new flowsint.typing.schema_org namespace), flowsint-core (Neo4j migration helper), and flowsint-api (content negotiation middleware) will be linked here once available:

Branch: feat/rfc-001-schema-org-namespace
PR:     [to be opened]

Key files introduced:
  flowsint-types/flowsint/typing/schema_org/         ← new namespace
  flowsint-core/flowsint/core/migrations/cypher/
    0010_add_schema_org_labels.cypher
  flowsint-core/flowsint/core/migrations/
    neo4j_schema_org.py
  flowsint-api/alembic/versions/
    0010_add_schema_org_type_column.py

12. References

13. Acknowledgements

This RFC was prepared for the Flowsint community. Feedback from maintainers, enricher authors, and OSINT practitioners is actively sought. Please open a GitHub issue referencing RFC-002 to comment, or submit a pull request against this document.


RFC-002 · Draft · 2026-03-16

Originally created by @gustavorps on GitHub (Mar 16, 2026). Original GitHub issue: https://github.com/reconurge/flowsint/issues/132 ``` RFC Number: 002 Title: Schema.org as the Core Semantic Type System for Flowsint Author: Gustavo RPS <github://username/gustavorps> Status: Draft Created: 2026-03-16 Repository: https://github.com/reconurge/flowsint ``` ## Abstract This RFC proposes adopting [Schema.org](https://schema.org) as the foundational semantic vocabulary for Flowsint's type system (`flowsint-types`). By grounding every entity — `Person`, `Organization`, `Domain`, `Email`, `Phone`, `WebSite`, and others — in Schema.org's well-established, machine-readable vocabulary, Flowsint gains a common language that is interoperable with the broader web, unambiguous for contributors, and extensible without breaking existing contracts. This document explains the motivation, defines a mapping strategy from current Pydantic models to Schema.org types, proposes a concrete implementation path, and addresses known limitations and open questions. ## 1. Motivation and Problem Statement Flowsint currently defines its entity types in `flowsint-types` as standalone Pydantic models — `Domain`, `IP`, `ASN`, `CIDR`, `Individual`, `Organization`, `Email`, `Phone`, `Website`, `SocialProfile`, `Credential`, `CryptoWallet`, `Transaction`, `NFT`, and more. These models are purpose-built and work well within the tool's existing scope, but they carry several structural limitations as the project grows: **1.1 Semantic ambiguity.** There is no canonical definition of what "Individual" means in relation to "Organization", or how a "Website" differs from a "Domain". New contributors must read source code to understand entity semantics rather than relying on a shared vocabulary. This friction slows onboarding and increases the likelihood of mismatches between enrichers. **1.2 No cross-tool interoperability.** OSINT workflows increasingly span multiple tools — Maltego, SpiderFoot, OpenCTI, and others. Flowsint data exported or exchanged with these tools requires ad-hoc translation layers because there is no common ontological footing. A shared vocabulary would let Flowsint speak the same language as any tool that also maps to Schema.org. **1.3 Limited discoverability and graph reasoning.** Flowsint uses Neo4j as its graph backend. Without typed, semantically named relationships and entities, graph queries are brittle and tooling like graph-based knowledge reasoning cannot leverage type hierarchies (e.g., knowing that `schema:Person` is a subtype of `schema:Thing` lets a query engine traverse more intelligently). **1.4 Reinventing solved problems.** Schema.org is a W3C-endorsed, community-maintained vocabulary used by billions of web pages and a growing number of knowledge graphs. It already defines types for `Person`, `Organization`, `WebSite`, `ContactPoint`, `PostalAddress`, and more that directly overlap with Flowsint's entity set. Maintaining parallel definitions duplicates work that the Schema.org community handles. ## 2. Proposal Summary This RFC proposes the following: 1. **Adopt Schema.org URIs as the canonical `@type` for every Flowsint entity.** Each Pydantic model in `flowsint-types` will declare a `schema_type` class variable pointing to its Schema.org equivalent (e.g., `"https://schema.org/Person"`). 2. **Extend Schema.org where no suitable type exists.** Types with no Schema.org equivalent (e.g., `ASN`, `CIDR`, `CryptoWallet`) will be defined as Flowsint-specific extensions under a custom namespace (`https://schema.flowsint.io/`) following Schema.org's own extension convention. 3. **Emit JSON-LD context by default** from the Flowsint API so that every entity response is semantically typed and directly consumable by any JSON-LD-aware tool. 4. **Store `@type` as a node label in Neo4j** so that graph traversals benefit from semantic type hierarchies. 5. **Keep Pydantic models as the internal contract** — this is a non-breaking, additive change. Schema.org alignment is expressed through metadata and serialization, not by replacing the model layer. ## 3. Schema.org Background Schema.org is a collaborative, community-driven project founded by Google, Microsoft, Yahoo, and Yandex in 2011 and now maintained under the [W3C Schema.org Community Group](https://www.w3.org/community/schemaorg). It defines a hierarchy of types rooted at `schema:Thing`, with property definitions that express relationships between types. Key properties of Schema.org relevant to this proposal: - **Open vocabulary:** All types and properties are publicly defined and versioned at `https://schema.org`. - **Extensible:** The extension mechanism allows third-party namespaces to add types and properties that inherit from core Schema.org types. - **JSON-LD native:** Schema.org is the primary vocabulary used in JSON-LD, the W3C standard for linked data. - **Broad adoption:** Used by billions of web pages for structured data markup and by major knowledge graphs including Wikidata and Google's Knowledge Graph. The current Schema.org release is **v29.4 (2025-12-08)**. ## 4. Type Mapping The following table maps existing Flowsint types to their Schema.org equivalents. Where no direct equivalent exists, a Flowsint extension type is proposed. ### 4.1 Core Entity Mappings | Flowsint Type | Schema.org Type | Notes | |-------------------|------------------------------------------|-------| | `Individual` | `schema:Person` | Direct mapping. Schema.org `Person` covers name, identifier, affiliation, email, telephone. | | `Organization` | `schema:Organization` | Direct mapping. Covers legal name, URL, address, founding date, members. | | `Email` | `schema:ContactPoint` (`contactType: email`) | Schema.org models email as a property (`schema:email`) or a `ContactPoint`. Flowsint's richer `Email` entity maps cleanly to `ContactPoint` with `contactType = "email"`. | | `Phone` | `schema:ContactPoint` (`contactType: phone`) | Same pattern as Email. | | `Website` | `schema:WebSite` | Direct mapping. Covers `url`, `name`, `description`. Sub-pages are `schema:WebPage`. | | `Domain` | `schema:WebSite` + Flowsint extension | Schema.org has no dedicated `Domain` type. Use `schema:WebSite` for the root representation and `flowsint:Domain` as a refinement carrying DNS-specific properties. | | `SocialProfile` | `schema:ProfilePage` | Schema.org 13+ includes `ProfilePage`. Maps to social media profile pages directly. | | `Credential` | `flowsint:Credential` (extension) | No Schema.org equivalent. Extends `schema:Thing`. | | `IP` | `flowsint:IPAddress` (extension) | No Schema.org equivalent. Extends `schema:Thing`. | | `ASN` | `flowsint:AutonomousSystem` (extension) | No Schema.org equivalent. Extends `schema:Thing`. | | `CIDR` | `flowsint:CIDRBlock` (extension) | No Schema.org equivalent. Extends `schema:Thing`. | | `CryptoWallet` | `flowsint:CryptoWallet` (extension) | Schema.org has no crypto-native types. Could consider `schema:MoneyAccount` as a distant analogy, but a clean extension is preferred for accuracy. | | `Transaction` | `schema:MoneyTransfer` | Reasonable mapping for fiat or on-chain value transfer. Crypto-specific fields (hash, gas, block) would be extension properties. | | `NFT` | `flowsint:NFT` (extension) | No Schema.org equivalent. Extends `schema:CreativeWork` given NFTs often represent digital assets. | ### 4.2 Relationship Mappings Schema.org also defines relationships (properties) that map to Flowsint's graph edges: | Flowsint Edge | Schema.org Property / Flowsint Extension | |-----------------------------------|------------------------------------------| | `Individual → Organization` | `schema:memberOf` / `schema:worksFor` | | `Individual → Email` | `schema:email` (via `ContactPoint`) | | `Individual → Phone` | `schema:telephone` (via `ContactPoint`) | | `Organization → Domain` | `schema:url` + `flowsint:ownsDomain` | | `Domain → IP` | `flowsint:resolvesTo` | | `IP → ASN` | `flowsint:belongsToASN` | | `ASN → CIDR` | `flowsint:announcesPrefix` | | `CryptoWallet → Transaction` | `flowsint:hasTransaction` | ## 5. Proposed Implementation ### 5.1 New Namespace: `flowsint.typing.schema_org` Rather than annotating the existing Pydantic models in `flowsint-types` directly, a dedicated new sub-package is introduced: ``` flowsint-types/ flowsint/ typing/ schema_org/ __init__.py # Public re-exports _base.py # SchemaOrgMixin and JSON-LD helpers _context.py # JSON-LD context constants entities/ __init__.py person.py # Individual → schema:Person organization.py # Organization → schema:Organization contact_point.py # Email, Phone → schema:ContactPoint website.py # Website → schema:WebSite social_profile.py # SocialProfile → schema:ProfilePage transaction.py # Transaction → schema:MoneyTransfer extensions/ __init__.py domain.py # flowsint:Domain ip_address.py # flowsint:IPAddress asn.py # flowsint:AutonomousSystem cidr.py # flowsint:CIDRBlock crypto_wallet.py # flowsint:CryptoWallet nft.py # flowsint:NFT credential.py # flowsint:Credential context.jsonld # Published JSON-LD context ``` The existing models in `flowsint-types` are **not modified**. The `schema_org` namespace is a parallel layer that wraps or mirrors them, providing Schema.org-aware serialization, validation, and identity. This separation guarantees that enrichers and API routes that depend on the current models continue to function without any changes. #### 5.1.1 Base Mixin (`_base.py`) All Schema.org-typed models share a common mixin that handles JSON-LD serialization and identity: ```python # flowsint/typing/schema_org/_base.py from __future__ import annotations from typing import ClassVar from pydantic import BaseModel, model_serializer FLOWSINT_NS = "https://schema.flowsint.io/" SCHEMA_NS = "https://schema.org/" CONTEXT_URL = "https://schema.flowsint.io/context.jsonld" class SchemaOrgMixin(BaseModel): """ Mixin that adds Schema.org identity and JSON-LD serialization to any Flowsint entity model. Subclasses must declare: schema_type: ClassVar[str] — the fully-qualified Schema.org or flowsint: type URI """ schema_type: ClassVar[str] # e.g. "https://schema.org/Person" schema_id: str | None = None # Optional stable entity IRI def to_jsonld(self) -> dict: """Return a JSON-LD representation of this entity.""" payload = self.model_dump(exclude_none=True, exclude={"schema_id"}) return { "@context": CONTEXT_URL, "@type": self.schema_type, "@id": self.schema_id or f"{FLOWSINT_NS}entities/{id(self)}", **payload, } @classmethod def from_flowsint(cls, model) -> "SchemaOrgMixin": """ Construct a Schema.org-typed instance from a raw flowsint-types model. Subclasses should override this to apply field mapping. """ return cls(**model.model_dump()) ``` #### 5.1.2 Example Entity (`entities/person.py`) ```python # flowsint/typing/schema_org/entities/person.py from typing import ClassVar from flowsint.typing.schema_org._base import SchemaOrgMixin, SCHEMA_NS class Person(SchemaOrgMixin): """ Maps flowsint-types `Individual` to schema:Person. https://schema.org/Person """ schema_type: ClassVar[str] = f"{SCHEMA_NS}Person" # Schema.org-aligned field names (camelCase aliases kept via Field) given_name: str | None = None # schema:givenName family_name: str | None = None # schema:familyName email: str | None = None # schema:email telephone: str | None = None # schema:telephone affiliation: str | None = None # schema:affiliation (org name) identifier: str | None = None # schema:identifier @classmethod def from_flowsint(cls, individual) -> "Person": return cls( given_name = individual.name, email = individual.email, identifier = str(individual.id) if individual.id else None, ) ``` #### 5.1.3 Example Extension (`extensions/ip_address.py`) ```python # flowsint/typing/schema_org/extensions/ip_address.py from typing import ClassVar from flowsint.typing.schema_org._base import SchemaOrgMixin, FLOWSINT_NS class IPAddress(SchemaOrgMixin): """ Flowsint extension type: flowsint:IPAddress. Extends schema:Thing — no native Schema.org equivalent exists. https://schema.flowsint.io/IPAddress """ schema_type: ClassVar[str] = f"{FLOWSINT_NS}IPAddress" address: str # The IP address string (v4 or v6) version: str | None = None # "IPv4" | "IPv6" asn: str | None = None # Owning ASN (flowsint:AutonomousSystem IRI) country_code: str | None = None # ISO 3166-1 alpha-2 city: str | None = None latitude: float | None = None longitude: float | None = None @classmethod def from_flowsint(cls, ip) -> "IPAddress": return cls( address = ip.address, version = ip.version, country_code = ip.country, city = ip.city, latitude = ip.lat, longitude = ip.lon, ) ``` #### 5.1.4 JSON-LD Context (`context.jsonld`) The context file is published at a stable, versioned URL and bundled inside the package at `flowsint/typing/schema_org/context.jsonld`: ```json { "@context": { "@vocab": "https://schema.org/", "schema": "https://schema.org/", "flowsint": "https://schema.flowsint.io/", "Person": "schema:Person", "Organization": "schema:Organization", "ContactPoint": "schema:ContactPoint", "WebSite": "schema:WebSite", "ProfilePage": "schema:ProfilePage", "MoneyTransfer": "schema:MoneyTransfer", "IPAddress": "flowsint:IPAddress", "AutonomousSystem":"flowsint:AutonomousSystem", "CIDRBlock": "flowsint:CIDRBlock", "Domain": "flowsint:Domain", "CryptoWallet": "flowsint:CryptoWallet", "NFT": "flowsint:NFT", "Credential": "flowsint:Credential", "resolvesTo": { "@id": "flowsint:resolvesTo", "@type": "@id" }, "belongsToASN": { "@id": "flowsint:belongsToASN", "@type": "@id" }, "ownsDomain": { "@id": "flowsint:ownsDomain", "@type": "@id" }, "announcesPrefix": { "@id": "flowsint:announcesPrefix", "@type": "@id" }, "hasTransaction": { "@id": "flowsint:hasTransaction", "@type": "@id" } } } ``` ### 5.2 API Integration The `flowsint-api` layer gains a thin content-negotiation adapter. No existing route signatures change: ```python # flowsint/api/middleware/schema_org.py from fastapi import Request, Response from flowsint.typing.schema_org import to_jsonld async def schema_org_middleware(request: Request, call_next): response = await call_next(request) if "application/ld+json" in request.headers.get("Accept", ""): # Re-serialize the response body through the schema_org layer body = await response.body() ld = to_jsonld(body) return Response( content=ld, media_type="application/ld+json", status_code=response.status_code, ) return response ``` Standard `application/json` responses are completely unaffected. ### 5.3 PostgreSQL Migration Script The following Alembic-compatible migration adds a `schema_type` column to every entity table and backfills it from the type mapping defined in Section 4.1. This column acts as the durable, queryable semantic tag inside the relational store. ```python # migrations/versions/0010_add_schema_org_type_column.py """Add schema_type column to all entity tables Revision ID: 0010 Revises: 0009 Create Date: 2026-03-16 """ from alembic import op import sqlalchemy as sa # Mapping: table_name → Schema.org / flowsint: URI SCHEMA_TYPE_MAP = { "individuals": "https://schema.org/Person", "organizations": "https://schema.org/Organization", "emails": "https://schema.org/ContactPoint", "phones": "https://schema.org/ContactPoint", "websites": "https://schema.org/WebSite", "social_profiles":"https://schema.org/ProfilePage", "transactions": "https://schema.org/MoneyTransfer", "domains": "https://schema.flowsint.io/Domain", "ips": "https://schema.flowsint.io/IPAddress", "asns": "https://schema.flowsint.io/AutonomousSystem", "cidrs": "https://schema.flowsint.io/CIDRBlock", "crypto_wallets": "https://schema.flowsint.io/CryptoWallet", "nfts": "https://schema.flowsint.io/NFT", "credentials": "https://schema.flowsint.io/Credential", } def upgrade() -> None: for table, schema_type in SCHEMA_TYPE_MAP.items(): # 1. Add nullable column first (safe for large tables — no full rewrite) op.add_column( table, sa.Column( "schema_type", sa.Text, nullable=True, comment="Schema.org or flowsint: type URI for this entity", ), ) # 2. Backfill all existing rows op.execute( f"UPDATE {table} SET schema_type = '{schema_type}' " f"WHERE schema_type IS NULL" ) # 3. Enforce NOT NULL now that backfill is complete op.alter_column(table, "schema_type", nullable=False) # 4. Index for fast type-filtered queries op.create_index( f"ix_{table}_schema_type", table, ["schema_type"], ) def downgrade() -> None: for table in SCHEMA_TYPE_MAP: op.drop_index(f"ix_{table}_schema_type", table_name=table) op.drop_column(table, "schema_type") ``` ### 5.4 Neo4j Migration Script The Cypher migration below runs as a one-off script via the `neo4j-admin` CLI or through the Neo4j Python driver. It adds a `schemaType` property and a secondary Schema.org label to every existing node, and creates an index for efficient type-based traversal. ```cypher // migrations/neo4j/0010_add_schema_org_labels.cypher // // Adds schemaType property and secondary Schema.org labels to all // existing Flowsint entity nodes. // // Run with: // cypher-shell -u neo4j -p <password> -f 0010_add_schema_org_labels.cypher // Or via the Python driver: // session.run(open("0010_add_schema_org_labels.cypher").read()) // ── 1. Add schemaType property ───────────────────────────────────────────── MATCH (n:Individual) SET n.schemaType = 'https://schema.org/Person', n.schemaLabel = 'Person'; MATCH (n:Organization) SET n.schemaType = 'https://schema.org/Organization', n.schemaLabel = 'Organization'; MATCH (n:Email) SET n.schemaType = 'https://schema.org/ContactPoint', n.schemaLabel = 'ContactPoint', n.contactType = 'email'; MATCH (n:Phone) SET n.schemaType = 'https://schema.org/ContactPoint', n.schemaLabel = 'ContactPoint', n.contactType = 'telephone'; MATCH (n:Website) SET n.schemaType = 'https://schema.org/WebSite', n.schemaLabel = 'WebSite'; MATCH (n:SocialProfile) SET n.schemaType = 'https://schema.org/ProfilePage', n.schemaLabel = 'ProfilePage'; MATCH (n:Transaction) SET n.schemaType = 'https://schema.org/MoneyTransfer', n.schemaLabel = 'MoneyTransfer'; MATCH (n:Domain) SET n.schemaType = 'https://schema.flowsint.io/Domain', n.schemaLabel = 'Domain'; MATCH (n:IP) SET n.schemaType = 'https://schema.flowsint.io/IPAddress', n.schemaLabel = 'IPAddress'; MATCH (n:ASN) SET n.schemaType = 'https://schema.flowsint.io/AutonomousSystem', n.schemaLabel = 'AutonomousSystem'; MATCH (n:CIDR) SET n.schemaType = 'https://schema.flowsint.io/CIDRBlock', n.schemaLabel = 'CIDRBlock'; MATCH (n:CryptoWallet) SET n.schemaType = 'https://schema.flowsint.io/CryptoWallet', n.schemaLabel = 'CryptoWallet'; MATCH (n:NFT) SET n.schemaType = 'https://schema.flowsint.io/NFT', n.schemaLabel = 'NFT'; MATCH (n:Credential) SET n.schemaType = 'https://schema.flowsint.io/Credential', n.schemaLabel = 'Credential'; // ── 2. Add secondary Schema.org labels ──────────────────────────────────── // This enables queries like: MATCH (n:Person) and MATCH (n:Individual) // to return the same nodes. MATCH (n:Individual) SET n:Person; MATCH (n:Organization) SET n:Organization; // already matches MATCH (n:Email) SET n:ContactPoint; MATCH (n:Phone) SET n:ContactPoint; MATCH (n:Website) SET n:WebSite; MATCH (n:SocialProfile)SET n:ProfilePage; MATCH (n:Transaction) SET n:MoneyTransfer; // ── 3. Create index on schemaType for fast type-filtered lookups ────────── CREATE INDEX schema_type_index IF NOT EXISTS FOR (n:__ALL_LABELS__) // Neo4j 5+: token lookup index covers all labels ON (n.schemaType); // For Neo4j 4.x, create per-label indexes instead: // CREATE INDEX ix_individual_schema_type IF NOT EXISTS FOR (n:Individual) ON (n.schemaType); // CREATE INDEX ix_organization_schema_type IF NOT EXISTS FOR (n:Organization) ON (n.schemaType); // ... (repeat for each label) // ── 4. Verify ───────────────────────────────────────────────────────────── MATCH (n) WHERE n.schemaType IS NOT NULL RETURN n.schemaType AS type, count(n) AS count ORDER BY count DESC; ``` A Python helper is also provided for running this migration programmatically within the Flowsint startup sequence: ```python # flowsint/core/migrations/neo4j_schema_org.py from neo4j import GraphDatabase from pathlib import Path import logging logger = logging.getLogger(__name__) MIGRATION_FILE = Path(__file__).parent / "cypher" / "0010_add_schema_org_labels.cypher" def run(driver: GraphDatabase.driver, database: str = "neo4j") -> None: """ Apply the Schema.org label migration to an existing Neo4j database. Safe to run multiple times (idempotent via SET and IF NOT EXISTS). """ cypher = MIGRATION_FILE.read_text(encoding="utf-8") # Split on blank lines to execute each statement independently statements = [s.strip() for s in cypher.split("\n\n") if s.strip() and not s.strip().startswith("//")] with driver.session(database=database) as session: for stmt in statements: logger.debug("Running Neo4j migration statement:\n%s", stmt) session.run(stmt) logger.info("Neo4j Schema.org migration 0010 applied successfully.") ``` ## 6. Benefits **Interoperability.** Any system that understands Schema.org or JSON-LD can consume Flowsint entities without a custom adapter. This makes it straightforward to pipe Flowsint output into SIEM platforms, knowledge graphs, or other OSINT tools. **Contributor clarity.** When a new enricher author needs to add a type, they have a canonical reference to check first rather than inventing a schema in isolation. The question "does this entity already exist?" has a definitive answer. **Richer graph queries.** Semantic type labels in Neo4j enable queries like "find all entities that are subtypes of `schema:Organization`" — a query that would otherwise require maintaining an explicit type hierarchy in application code. **Future-proofing.** Schema.org is actively maintained. As the web evolves — and as OSINT increasingly intersects with structured data on the web — Flowsint's type system evolves with it at no extra cost. **SEO and documentation.** If Flowsint ever exposes a public API or documentation site, Schema.org types are directly understood by search engines, improving discoverability of API documentation. ## 7. Drawbacks and Limitations **7.1 Schema.org is not designed for OSINT.** Schema.org's primary audience is web publishers marking up content for search engines. Many OSINT-specific concepts (IP geolocation, ASN routing, credential exposure) are outside its scope and must be extensions. This means Flowsint cannot fully delegate to Schema.org — it must maintain its own namespace for a significant portion of its type system. **7.2 Property naming differences.** Schema.org uses camelCase property names (`givenName`, `legalName`, `foundingDate`) that may differ from Flowsint's current snake_case Pydantic fields. Mapping between the two requires care to avoid confusion. The `to_jsonld()` method must handle this translation explicitly. **7.3 Schema.org evolves.** A type present in Schema.org v29 may be renamed, deprecated, or restructured in a future release. Flowsint would need a policy for tracking Schema.org releases and updating mappings accordingly. This is manageable but not zero-cost. **7.4 Adds conceptual overhead for contributors.** Contributors unfamiliar with Schema.org, JSON-LD, or RDF concepts may find the new machinery confusing. Good documentation and a clear "quick start for adding a new type" guide would mitigate this. ## 8. Alternatives Considered **8.1 STIX/TAXII.** The [Structured Threat Information eXpression (STIX)](https://oasis-open.github.io/cti-documentation/) standard is the de facto vocabulary in cyber threat intelligence. It covers many of the same entities (IP, Domain, Email, Person, Organization) in a security-native way. STIX was not chosen as the primary mapping for two reasons: (a) it is significantly more verbose and complex than needed for Flowsint's graph-based exploration model, and (b) Schema.org is more broadly understood outside the security community, which Flowsint's journalist and researcher audience will appreciate. A secondary STIX serialization could be offered in a future RFC. **8.2 Wikidata / Linked Open Data.** Wikidata's property and type system is extremely expressive but also extremely granular and requires deep familiarity with Q-codes. It is not appropriate as a primary vocabulary for an early-stage project. **8.3 Custom Flowsint ontology (status quo++).** Simply documenting the existing types more thoroughly does not solve the interoperability or semantic ambiguity problems. A custom ontology from scratch would replicate work already done by Schema.org for the overlapping types. ## 9. Migration Path This change is intended to be **non-breaking and incremental**: | Phase | Action | Breaking? | |-------|--------|-----------| | 1 | Create `flowsint.typing.schema_org` namespace and `SchemaOrgMixin` | No | | 2 | Implement all entity and extension models under the new namespace | No | | 3 | Publish `context.jsonld` inside the package and at a stable URL | No | | 4 | Run PostgreSQL Alembic migration `0010_add_schema_org_type_column` | No | | 5 | Run Neo4j Cypher migration `0010_add_schema_org_labels.cypher` | No | | 6 | Add `Accept: application/ld+json` content negotiation to the API | No | | 7 | (Optional) Rename Pydantic fields in `flowsint-types` to align with Schema.org property names | **Yes** — semver major | Phases 1–6 can ship together as a minor version since they are entirely additive — existing models, database schemas, and graph queries are untouched. Phase 7 is a separate, explicitly opt-in migration behind a semver major bump and should only be pursued if the community concludes the naming alignment is worth the breaking change. ## 10. Open Questions The following questions are raised for community discussion: 1. **Should `flowsint:` extension types also be submitted upstream to Schema.org?** Types like `IPAddress` and `AutonomousSystem` are general enough to be useful beyond OSINT. Submitting proposals to the W3C Schema.org Community Group is feasible but requires sustained engagement. 2. **Where should the `context.jsonld` be hosted?** Options include the repository itself (requiring a versioned URL like `https://raw.githubusercontent.com/reconurge/flowsint/v1.2.0/schema/context.jsonld`), a dedicated domain (`https://schema.flowsint.io/`), or a GitHub Pages deployment. Each has trade-offs in terms of stability and maintenance. 3. **Should STIX be offered as a secondary serialization format?** Given Flowsint's cybersecurity audience, a `to_stix()` method alongside `to_jsonld()` could significantly improve interoperability with professional threat intelligence platforms. This is out of scope for this RFC but recommended as a follow-on. 4. **How should `CryptoWallet` relate to Schema.org's financial types?** Schema.org has `FinancialProduct`, `BankAccount`, and `MoneyAccount`. None are a clean fit for a cryptographic wallet. A thorough review of Schema.org's financial extension vocabulary is warranted before finalising the extension definition. ## 11. Reference Implementation A reference branch demonstrating the proposed changes across `flowsint-types` (new `flowsint.typing.schema_org` namespace), `flowsint-core` (Neo4j migration helper), and `flowsint-api` (content negotiation middleware) will be linked here once available: ``` Branch: feat/rfc-001-schema-org-namespace PR: [to be opened] Key files introduced: flowsint-types/flowsint/typing/schema_org/ ← new namespace flowsint-core/flowsint/core/migrations/cypher/ 0010_add_schema_org_labels.cypher flowsint-core/flowsint/core/migrations/ neo4j_schema_org.py flowsint-api/alembic/versions/ 0010_add_schema_org_type_column.py ``` ## 12. References - Schema.org Documentation: https://schema.org/docs/documents.html - Schema.org Full Type Hierarchy: https://schema.org/docs/full.html - Schema.org Extension Mechanism: https://schema.org/docs/extension.html - Schema.org Data Model: https://schema.org/docs/datamodel.html - JSON-LD 1.1 (W3C Recommendation): https://www.w3.org/TR/json-ld11/ - W3C Schema.org Community Group: https://www.w3.org/community/schemaorg - STIX 2.1 Specification: https://docs.oasis-open.org/cti/stix/v2.1/stix-v2.1.html - Flowsint Repository: https://github.com/reconurge/flowsint - Flowsint Types Module: `flowsint-types/` ## 13. Acknowledgements This RFC was prepared for the Flowsint community. Feedback from maintainers, enricher authors, and OSINT practitioners is actively sought. Please open a GitHub issue referencing `RFC-002` to comment, or submit a pull request against this document. --- *RFC-002 · Draft · 2026-03-16*
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/flowsint#93