[GH-ISSUE #94] Configurable API-Based Enricher Type [with YAML templates] #756

Open
opened 2026-04-24 17:46:35 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @xhzeem on GitHub (Dec 11, 2025).
Original GitHub issue: https://github.com/reconurge/flowsint/issues/94

It would be very helpful to introduce a new category of enrichers that operate through simple, configurable API requests. The idea is to allow users to define an API call using a YAML-based template, including request parameters, variable placeholders, and response-mapping logic that aligns the returned data to a specific type within Flowsint.

This addition would enable the creation of a broad marketplace of web-based API enrichers, significantly simplifying the process of designing and implementing new integrations. Instead of writing custom code, users could quickly assemble enrichers by describing the request/response flow in a declarative format.

From a security standpoint, this feature should be limited to basic, controlled web requests with safe encoding/decoding utilities. The system must prevent any functionality that could introduce remote code execution risks or otherwise compromise the server environment.

Originally created by @xhzeem on GitHub (Dec 11, 2025). Original GitHub issue: https://github.com/reconurge/flowsint/issues/94 It would be very helpful to introduce a new category of enrichers that operate through simple, configurable API requests. The idea is to allow users to define an API call using a YAML-based template, including request parameters, variable placeholders, and response-mapping logic that aligns the returned data to a specific type within Flowsint. This addition would enable the creation of a broad marketplace of web-based API enrichers, significantly simplifying the process of designing and implementing new integrations. Instead of writing custom code, users could quickly assemble enrichers by describing the request/response flow in a declarative format. From a security standpoint, this feature should be limited to basic, controlled web requests with safe encoding/decoding utilities. The system must prevent any functionality that could introduce remote code execution risks or otherwise compromise the server environment.
Author
Owner

@dextmorgn commented on GitHub (Dec 12, 2025):

hey @xhzeem, kind of nuclei like ?

<!-- gh-comment-id:3645767960 --> @dextmorgn commented on GitHub (Dec 12, 2025): hey @xhzeem, kind of nuclei like ?
Author
Owner

@xhzeem commented on GitHub (Dec 12, 2025):

Yes
Something close to that for API or just extraction kind of Enrichers

<!-- gh-comment-id:3645929966 --> @xhzeem commented on GitHub (Dec 12, 2025): Yes Something close to that for API or just extraction kind of Enrichers
Author
Owner

@xhzeem commented on GitHub (Jan 25, 2026):

These are key security considerations I wrote from a security engineer’s perspective (with some AI assistance) that should be helpful to keep in mind while implementing this feature.

What to keep (core safety) without over-restricting the feature

Got it — here’s a cleaner summary that keeps the feature flexible and template-driven, but safe by default, without over-restricting things like headers.

Safe-by-default guardrails (without making it too limited)

1) Prevent internal network access (SSRF protection)

  • Block requests to localhost, private IP ranges, link-local, and cloud metadata addresses.
  • Do DNS resolution and validate the resolved IP (protects against DNS rebinding).
  • If redirects are allowed, re-validate the destination host/IP on every hop.

2) Keep templates declarative (no code execution surface)

  • Templates can describe: request(s), variables, matchers/extractors, and response→type mapping.
  • No scripting, no eval, no “expressions” that become a mini programming language.
  • Allow simple placeholder substitution + a small set of safe transforms (ex: urlencode/base64, basic string casing/trim).

3) Allow custom headers & payloads, but block a tiny set of dangerous ones

  • Let users set most headers and body formats needed for real APIs.
  • Still hard-block or override only “protocol control” headers that enable request smuggling/proxy tricks (e.g., Connection, Transfer-Encoding, Proxy-*) and prevent manual Host mismatch.

4) Safe auth + secret handling

  • Support secrets via secret_ref (recommended).
  • Redact secrets in logs automatically (Authorization, cookies, API keys).
  • Prevent accidental exfiltration: don’t let templates “return secrets” unless explicitly allowed by policy.

5) Abuse/DoS controls that don’t reduce functionality

  • Tight timeouts (connect + total).
  • Response size limits.
  • Bounded retries with backoff (only on safe failure codes).
  • Per-tenant rate limits + concurrency caps.

6) Response parsing and mapping that stays safe

  • Explicit response parsing mode (json/text) rather than “auto”.
  • Matchers/extractors are allowed, but bounded (max response size, max regex complexity if regex is supported).
  • Output must validate against the expected typed schema (so templates can’t emit arbitrary junk).

7) Observability + audit trail

  • Log only safe metadata (method/host/path/status/latency/template id).
  • Version templates + audit who changed what and when.

Sure — add this as a dedicated guardrail that still keeps the feature powerful.

8) Safe data-type handlers (parse anything into an object, safely)

To let templates consume many response formats (and reuse the parsed object later), support multiple data type handlers with strict “safe parsing” rules:

  • Supported handlers (example): json, urlencoded, csv, xml, html, maybe yaml (careful).
  • Each handler outputs a normalized object model (e.g., map/list/string/number/bool) that can be referenced later in mappings/extractors.

Critical security rules:

  • XML/HTML:

    • disable DTDs and external entities entirely (prevents XXE)
    • disable external resource fetching
    • enforce secure parser settings (no entity expansion / no network)
  • YAML (if supported):

    • safe loader only (no tags that construct objects)
  • All formats:

    • hard limits on input size, nesting depth, number of nodes/keys, and string lengths (prevents “billion laughs”-style bombs and memory blowups)
    • explicit encoding rules (UTF-8, reject/normalize invalid encodings)
    • never auto-execute anything (no “deserialize into classes”, no dynamic type instantiation)

This way, you can safely say: “any response type can be parsed into an object for later use,” while keeping the system protected from XXE, entity expansion, parser bombs, and unsafe deserialization.


Minimal “v1” philosophy

Flexible requests + flexible mapping, but with strong boundaries on:

  • where requests can go (public internet only by default),
  • no code execution,
  • bounded resources (timeouts/size/rate),
  • safe secret handling.

If you share your current draft YAML fields (even just the top-level keys), I can suggest a compact schema that matches this philosophy and is easy to validate.

<!-- gh-comment-id:3797062646 --> @xhzeem commented on GitHub (Jan 25, 2026): These are key security considerations I wrote from a security engineer’s perspective (with some AI assistance) that should be helpful to keep in mind while implementing this feature. ## What to keep (core safety) without over-restricting the feature Got it — here’s a cleaner summary that keeps the feature **flexible and template-driven**, but **safe by default**, without over-restricting things like headers. ## Safe-by-default guardrails (without making it too limited) ### 1) Prevent internal network access (SSRF protection) * Block requests to **localhost**, **private IP ranges**, **link-local**, and **cloud metadata** addresses. * Do DNS resolution and **validate the resolved IP** (protects against DNS rebinding). * If redirects are allowed, **re-validate** the destination host/IP on every hop. ### 2) Keep templates declarative (no code execution surface) * Templates can describe: request(s), variables, matchers/extractors, and response→type mapping. * No scripting, no eval, no “expressions” that become a mini programming language. * Allow simple placeholder substitution + a small set of safe transforms (ex: urlencode/base64, basic string casing/trim). ### 3) Allow custom headers & payloads, but block a tiny set of dangerous ones * Let users set most headers and body formats needed for real APIs. * Still hard-block or override only “protocol control” headers that enable request smuggling/proxy tricks (e.g., `Connection`, `Transfer-Encoding`, `Proxy-*`) and prevent manual `Host` mismatch. ### 4) Safe auth + secret handling * Support secrets via `secret_ref` (recommended). * Redact secrets in logs automatically (Authorization, cookies, API keys). * Prevent accidental exfiltration: don’t let templates “return secrets” unless explicitly allowed by policy. ### 5) Abuse/DoS controls that don’t reduce functionality * Tight timeouts (connect + total). * Response size limits. * Bounded retries with backoff (only on safe failure codes). * Per-tenant rate limits + concurrency caps. ### 6) Response parsing and mapping that stays safe * Explicit response parsing mode (json/text) rather than “auto”. * Matchers/extractors are allowed, but bounded (max response size, max regex complexity if regex is supported). * Output must validate against the expected typed schema (so templates can’t emit arbitrary junk). ### 7) Observability + audit trail * Log only safe metadata (method/host/path/status/latency/template id). * Version templates + audit who changed what and when. Sure — add this as a dedicated guardrail that still keeps the feature powerful. ### 8) Safe data-type handlers (parse anything into an object, safely) To let templates consume many response formats (and reuse the parsed object later), support multiple **data type handlers** with strict “safe parsing” rules: * **Supported handlers** (example): `json`, `urlencoded`, `csv`, `xml`, `html`, maybe `yaml` (careful). * Each handler outputs a **normalized object model** (e.g., map/list/string/number/bool) that can be referenced later in mappings/extractors. **Critical security rules:** * **XML/HTML**: * disable DTDs and external entities entirely (prevents XXE) * disable external resource fetching * enforce secure parser settings (no entity expansion / no network) * **YAML** (if supported): * safe loader only (no tags that construct objects) * **All formats**: * hard limits on input size, nesting depth, number of nodes/keys, and string lengths (prevents “billion laughs”-style bombs and memory blowups) * explicit encoding rules (UTF-8, reject/normalize invalid encodings) * never auto-execute anything (no “deserialize into classes”, no dynamic type instantiation) This way, you can safely say: “any response type can be parsed into an object for later use,” while keeping the system protected from XXE, entity expansion, parser bombs, and unsafe deserialization. --- ### Minimal “v1” philosophy **Flexible requests + flexible mapping**, but with strong boundaries on: * where requests can go (public internet only by default), * no code execution, * bounded resources (timeouts/size/rate), * safe secret handling. If you share your current draft YAML fields (even just the top-level keys), I can suggest a compact schema that matches this philosophy and is easy to validate.
Author
Owner

@jonathafernandez85-tech commented on GitHub (Feb 17, 2026):

servicios@enviospremiuncorreos.es

<!-- gh-comment-id:3912418099 --> @jonathafernandez85-tech commented on GitHub (Feb 17, 2026): servicios@enviospremiuncorreos.es
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/flowsint#756