Create README.md

2026-03-11 17:48:31 -05:00 · 2026-02-09 12:44:00 +08:00
parent 62e5c2b742
commit f0d53c7d13
1 changed files with 433 additions and 0 deletions
--- a/rag_tutorials/wfgy_rag_failure_clinic/README.md
+++ b/rag_tutorials/wfgy_rag_failure_clinic/README.md
@@ -0,0 +1,433 @@
+# WFGY 16 Problem Map RAG Failure Clinic 🩺
+
+An interactive **RAG failure clinic** that helps you debug LLM and RAG pipelines using the **WFGY 16 Problem Map**.  
+You paste a real bug description, the tool classifies it into **No.1–No.16**, and suggests a **minimal structural fix**, not just a generic prompt tweak.
+
+This tutorial lives under `rag_tutorials/wfgy_rag_failure_clinic` and is fully self-contained.  
+All extra knowledge comes from the open source WFGY repo on GitHub.
+
+---
+
+## 🧠 What you will learn
+
+By running this example, you will learn how to:
+
+- Use a **problem taxonomy** (the WFGY 16 Problem Map) to classify LLM and RAG failures.
+- Turn that taxonomy into a **system prompt** that acts like a semantic firewall.
+- Describe **real-world RAG bugs** in plain text so an LLM can reason about them.
+- Call any **OpenAI-compatible API** (OpenAI, Nebius, your own proxy, etc.) from a small Python script.
+- Map the diagnosis back to concrete docs and checklists in the WFGY Problem Map.
+
+This is not a full framework.  
+It is a compact **clinic app** that demonstrates a pattern you can adapt in your own stacks.
+
+---
+
+## 📁 Folder structure
+
+This tutorial expects the following files in `rag_tutorials/wfgy_rag_failure_clinic`:
+
+- `README.md` ← this file
+- `wfgy_rag_failure_clinic.py` ← minimal interactive CLI / Colab-friendly script
+- `requirements.txt` ← Python dependencies
+
+You do **not** need to copy any WFGY content into this repo.  
+The script loads it directly from the public WFGY GitHub repo:
+
+- WFGY main repo: <https://github.com/onestardao/WFGY>  
+- WFGY Problem Map: <https://github.com/onestardao/WFGY/tree/main/ProblemMap#readme>  
+- TXTOS prompt file: <https://github.com/onestardao/WFGY/blob/main/OS/TXTOS.txt>
+
+All WFGY assets are released under the MIT License.
+
+---
+
+## ✅ Prerequisites
+
+- Python 3.9 or newer.
+- An API key for any **OpenAI-compatible** chat completion endpoint.
+  - For example, `OPENAI_API_KEY` for the default `https://api.openai.com/v1`.
+  - Or a Nebius key and base URL, or your own compatible proxy.
+- Basic familiarity with RAG pipelines, logs, and failure modes.
+
+---
+
+## ⚙️ Setup
+
+From the root of the `awesome-llm-apps` repo:
+
+```bash
+cd rag_tutorials/wfgy_rag_failure_clinic
+pip install -r requirements.txt
+````
+
+Example `requirements.txt`:
+
+```text
+openai>=1.6.0
+requests>=2.31.0
+```
+
+Set your API key as an environment variable (recommended):
+
+```bash
+export OPENAI_API_KEY="sk-..."
+# optional, if you use a custom endpoint
+# export OPENAI_BASE_URL="https://your-proxy.example.com/v1"
+```
+
+---
+
+## 🧩 WFGY 16 Problem Map reference
+
+The **WFGY 16 Problem Map** is a checklist of recurring failure modes in LLM and RAG systems.
+This clinic treats your bug report as a symptom and maps it into one of these sixteen buckets.
+
+Below is a compact reference table.
+Each row links back to the corresponding page in the WFGY repo.
+
+| No. | problem domain (with layer/tags)       | what breaks                                   | doc                                                                                                              |
+| --- | -------------------------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
+| 1   | [IN] hallucination & chunk drift {OBS} | retrieval returns wrong or irrelevant content | [hallucination.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/hallucination.md)                     |
+| 2   | [RE] interpretation collapse {OBS}     | chunk is right, logic is wrong                | [retrieval-collapse.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/retrieval-collapse.md)           |
+| 3   | [RE] long reasoning chains {OBS}       | drifts across multi-step tasks                | [context-drift.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/context-drift.md)                     |
+| 4   | [RE] bluffing / overconfidence         | confident but unfounded answers               | [bluffing.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/bluffing.md)                               |
+| 5   | [IN] semantic ≠ embedding {OBS}        | cosine match does not equal true meaning      | [embedding-vs-semantic.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/embedding-vs-semantic.md)     |
+| 6   | [RE] logic collapse & recovery {OBS}   | dead ends, needs controlled reset             | [logic-collapse.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/logic-collapse.md)                   |
+| 7   | [ST] memory breaks across sessions     | lost threads, no continuity                   | [memory-coherence.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/memory-coherence.md)               |
+| 8   | [IN] debugging is a black box {OBS}    | no visibility into the failure path           | [retrieval-traceability.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/retrieval-traceability.md)   |
+| 9   | [ST] entropy collapse {OBS}            | attention melts, incoherent output            | [entropy-collapse.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/entropy-collapse.md)               |
+| 10  | [RE] creative freeze                   | flat, literal outputs                         | [creative-freeze.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/creative-freeze.md)                 |
+| 11  | [RE] symbolic collapse                 | abstract or logical prompts break             | [symbolic-collapse.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/symbolic-collapse.md)             |
+| 12  | [RE] philosophical recursion           | self-reference loops, paradox traps           | [philosophical-recursion.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/philosophical-recursion.md) |
+| 13  | [ST] multi-agent chaos {OBS}           | agents overwrite or misalign logic            | [Multi-Agent_Problems.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/Multi-Agent_Problems.md)       |
+| 14  | [OP] bootstrap ordering                | services fire before dependencies are ready   | [bootstrap-ordering.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/bootstrap-ordering.md)           |
+| 15  | [OP] deployment deadlock               | circular waits in infra                       | [deployment-deadlock.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/deployment-deadlock.md)         |
+| 16  | [OP] pre-deploy collapse {OBS}         | version skew or missing secret on first call  | [predeploy-collapse.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/predeploy-collapse.md)           |
+
+In this tutorial the three built-in examples are mapped as follows:
+
+* Example 1 → **No.1** hallucination and chunk drift.
+* Example 2 → **No.14** bootstrap ordering.
+* Example 3 → **No.16** pre-deploy collapse and config drift.
+
+For deeper recovery plans and checklists, open the full Problem Map:
+
+* [https://github.com/onestardao/WFGY/tree/main/ProblemMap](https://github.com/onestardao/WFGY/tree/main/ProblemMap#readme)
+
+---
+
+## 🩻 How the clinic works
+
+At a high level:
+
+1. The script **downloads** two small text files from the WFGY repo:
+
+   * The Problem Map README (for the taxonomy).
+   * The TXTOS file (for a stable prompting style).
+2. It **builds a system prompt** that:
+
+   * Explains the 16 Problem Map categories.
+   * States rules for picking a primary diagnosis and an optional secondary.
+   * Reminds the model that examples 1–3 are canonical templates.
+3. You pick one of three **ready-made bug examples** or paste your own:
+
+   * Retrieval hallucination around RAG context.
+   * Deployment ordering / infra race around vector stores.
+   * Pre-deploy secret/config drift.
+4. The model returns:
+
+   * A primary **Problem Map number (No.1–No.16)**.
+   * An optional secondary candidate.
+   * A short explanation and a proposed **minimal structural fix**.
+5. You can then open the linked Problem Map doc for a deeper walkthrough of the failure mode and mitigations.
+
+The goal is not to be perfect, but to show how a **problem taxonomy + prompt** can become a lightweight debugging assistant.
+
+---
+
+## 🚀 Running the clinic
+
+From inside `rag_tutorials/wfgy_rag_failure_clinic`:
+
+```bash
+python wfgy_rag_failure_clinic.py
+```
+
+You will see a simple text UI:
+
+* If `OPENAI_API_KEY` is not set, the script will ask for an API key.
+* You can keep the default base URL (`https://api.openai.com/v1`) and model (`gpt-4o`) or override them.
+* Then you choose:
+
+  * `1` → built-in retrieval hallucination example (No.1 style).
+  * `2` → bootstrap ordering / infra race example (No.14 style).
+  * `3` → pre-deploy config drift example (No.16 style).
+  * `p` → paste your own bug description.
+
+A truncated sample interaction:
+
+```text
+$ python wfgy_rag_failure_clinic.py
+
+Loaded WFGY assets. Ready to debug.
+
+Choose an example or paste your own:
+  [1] Example 1 — retrieval hallucination (No.1 style)
+  [2] Example 2 — bootstrap ordering / infra race (No.14 style)
+  [3] Example 3 — secrets / config drift (No.16 style)
+  [p] Paste my own RAG / LLM bug
+Your choice: 1
+
+Running diagnosis with model: gpt-4o ...
+
+Primary Problem Map match: No.1 — hallucination & chunk drift
+Secondary candidate: No.8 — debugging is a black box
+
+Why:
+- Retrieved chunks explicitly say only cards and PayPal are supported.
+- The answer confidently invents Bitcoin support.
+- Logs show no retrieval or vector errors, so the drift is inside the LLM step.
+
+Minimal structural fix:
+- Tighten the answer contract so the model must quote and reason over retrieved snippets.
+- Add an explicit "do not invent payment methods" clause in your system prompt.
+- Log and surface all retrieval snippets next to the answer so operators can audit future failures.
+
+For the full checklist, see:
+https://github.com/onestardao/WFGY/blob/main/ProblemMap/hallucination.md
+```
+
+You can repeat the process for as many bugs as you want in a single run.
+
+---
+
+## 🧪 Minimal script (`wfgy_rag_failure_clinic.py`)
+
+Below is a minimal implementation that matches the description above.
+Place this in `rag_tutorials/wfgy_rag_failure_clinic/wfgy_rag_failure_clinic.py`.
+
+```python
+import os
+import textwrap
+from getpass import getpass
+
+import requests
+from openai import OpenAI
+
+PROBLEM_MAP_URL = "https://raw.githubusercontent.com/onestardao/WFGY/main/ProblemMap/README.md"
+TXTOS_URL = "https://raw.githubusercontent.com/onestardao/WFGY/main/OS/TXTOS.txt"
+WFGY_PROBLEM_MAP_HOME = "https://github.com/onestardao/WFGY/tree/main/ProblemMap"
+WFGY_REPO = "https://github.com/onestardao/WFGY"
+
+
+EXAMPLE_1 = """=== Example 1 — retrieval hallucination (No.1 style) ===
+
+Context:
+You have a simple RAG chatbot that answers questions from a product FAQ.
+The FAQ only covers billing rules for your SaaS product and does NOT mention anything about cryptocurrency.
+
+User prompt:
+"Can I pay my subscription with Bitcoin?"
+
+Retrieved context (from vector store):
+- "We only accept major credit cards and PayPal."
+- "All payments are processed in USD."
+
+Model answer:
+"Yes, you can pay with Bitcoin. We support several cryptocurrencies through a third-party payment gateway."
+
+Logs:
+No errors. Retrieval shows the FAQ chunks above, but the model still confidently invents Bitcoin support.
+"""
+
+
+EXAMPLE_2 = """=== Example 2 — bootstrap ordering / infra race (No.14 style) ===
+
+Context:
+You have a RAG API with three services: api-gateway, rag-worker, and vector-db (for example Qdrant or FAISS).
+In local docker compose everything works.
+
+Deployment:
+In production, services are deployed on Kubernetes.
+
+Symptom:
+Right after a fresh deploy, api-gateway returns 500 errors for the first few minutes.
+Logs show connection timeouts from api-gateway to vector-db.
+
+After a few minutes, the errors disappear and the system behaves normally.
+You suspect a startup race between api-gateway and vector-db but are not sure how to fix it properly.
+"""
+
+
+EXAMPLE_3 = """=== Example 3 — secrets / config drift around first deploy (No.16 style) ===
+
+Context:
+You added a new environment variable for the RAG pipeline: SECRET_RAG_KEY.
+This is required by middleware that signs outgoing requests to an internal search API.
+
+Local:
+On developer machines, SECRET_RAG_KEY is defined in .env and everything works.
+
+Production:
+You deployed a new version but forgot to add SECRET_RAG_KEY to the production environment.
+The first requests after deploy fail with 500 errors and "missing secret" messages in the logs.
+
+After hot-patching the secret into production, the errors stop.
+However, similar "first deploy breaks because of missing config" incidents keep happening.
+"""
+
+
+def fetch_text(url: str) -> str:
+    resp = requests.get(url, timeout=30)
+    resp.raise_for_status()
+    return resp.text
+
+
+def build_system_prompt(problem_map: str, txtos: str) -> str:
+    header = """
+You are an LLM debugger that follows the WFGY 16 Problem Map.
+
+Goal:
+Given a description of a bug or failure in an LLM or RAG pipeline, you must:
+- Map it to exactly one primary Problem Map number (No.1–No.16).
+- Optionally propose one secondary candidate if it is very close.
+- Explain your reasoning in plain language.
+- Propose a minimal structural fix, not just prompt tweaking.
+- When possible, point the user toward the relevant WFGY Problem Map documents.
+
+You are not allowed to invent new problem categories.
+You must choose from the sixteen WFGY Problem Map entries only.
+
+About the three built-in examples:
+- Example 1 is a clean retrieval hallucination pattern. It should map primarily to No.1.
+- Example 2 is a bootstrap ordering or infra race pattern. It should map primarily to No.14.
+- Example 3 is a first deploy secrets / config drift pattern. It should map primarily to No.16.
+"""
+    return (
+        textwrap.dedent(header).strip()
+        + "\n\n=== TXTOS excerpt ===\n"
+        + txtos[:4000]
+        + "\n\n=== Problem Map excerpt ===\n"
+        + problem_map[:4000]
+    )
+
+
+def load_wfgy_assets() -> str:
+    print("Downloading WFGY Problem Map and TXTOS prompt ...")
+    problem_map_text = fetch_text(PROBLEM_MAP_URL)
+    txtos_text = fetch_text(TXTOS_URL)
+    system_prompt = build_system_prompt(problem_map_text, txtos_text)
+    print("Loaded WFGY assets. Ready to debug.\n")
+    return system_prompt
+
+
+def make_client_and_model():
+    api_key = os.getenv("OPENAI_API_KEY")
+    if not api_key:
+        api_key = getpass("Enter your OpenAI-compatible API key: ").strip()
+
+    base_url = os.getenv("OPENAI_BASE_URL", "").strip()
+    if not base_url:
+        base_url = "https://api.openai.com/v1"
+
+    model_name = os.getenv("OPENAI_MODEL", "").strip()
+    if not model_name:
+        model_name = input("Model name (press Enter for gpt-4o): ").strip() or "gpt-4o"
+
+    client = OpenAI(api_key=api_key, base_url=base_url)
+    print(f"\nUsing base URL: {base_url}")
+    print(f"Using model: {model_name}\n")
+    return client, model_name
+
+
+def choose_bug_description() -> str:
+    print("Choose an example or paste your own bug description:")
+    print("  [1] Example 1 — retrieval hallucination (No.1 style)")
+    print("  [2] Example 2 — bootstrap ordering / infra race (No.14 style)")
+    print("  [3] Example 3 — secrets / config drift (No.16 style)")
+    print("  [p] Paste my own RAG / LLM bug\n")
+
+    choice = input("Your choice: ").strip().lower()
+    print()
+
+    if choice == "1":
+        return EXAMPLE_1
+    if choice == "2":
+        return EXAMPLE_2
+    if choice == "3":
+        return EXAMPLE_3
+
+    print("Paste your bug description. End with an empty line.")
+    lines = []
+    while True:
+        try:
+            line = input()
+        except EOFError:
+            break
+        if not line.strip():
+            break
+        lines.append(line)
+    user_bug = "\n".join(lines).strip()
+    if not user_bug:
+        print("No bug description detected, aborting this round.\n")
+    return user_bug
+
+
+def run_once(client: OpenAI, model_name: str, system_prompt: str) -> None:
+    bug = choose_bug_description()
+    if not bug:
+        return
+
+    print("Running diagnosis ...\n")
+
+    completion = client.chat.completions.create(
+        model=model_name,
+        temperature=0.2,
+        messages=[
+            {"role": "system", "content": system_prompt},
+            {
+                "role": "user",
+                "content": (
+                    "Here is the bug description. "
+                    "Follow the WFGY 16 Problem Map rules described above.\n\n"
+                    + bug
+                ),
+            },
+        ],
+    )
+
+    reply = completion.choices[0].message.content or ""
+    print(reply)
+    print("\nFor detailed checklists, visit:")
+    print(f"- Problem Map home: {WFGY_PROBLEM_MAP_HOME}")
+    print(f"- Full WFGY repo:   {WFGY_REPO}\n")
+
+
+def main():
+    system_prompt = load_wfgy_assets()
+    client, model_name = make_client_and_model()
+
+    while True:
+        run_once(client, model_name, system_prompt)
+        again = input("Debug another bug? (y/n): ").strip().lower()
+        if again != "y":
+            print("Session finished. Goodbye.")
+            break
+        print()
+
+
+if __name__ == "__main__":
+    main()
+```
+
+---
+
+## 🔗 Attribution
+
+* WFGY project: [https://github.com/onestardao/WFGY](https://github.com/onestardao/WFGY)
+* Original Problem Map and TXTOS design by the WFGY author.
+* This tutorial is a small integration example contributed to `awesome-llm-apps`
+  to demonstrate how a **failure taxonomy** can be plugged into an LLM debugging tool.
+
+You are free to adapt this pattern to your own taxonomies, evaluation suites, or internal incident post-mortems.