mirror of
https://github.com/Shubhamsaboo/awesome-llm-apps.git
synced 2026-03-08 23:13:56 -05:00
chore: rename RAG failure clinic tutorial folder
- Rename `rag_tutorials/wfgy_rag_failure_clinic` to `rag_tutorials/rag_failure_diagnostics_clinic`. - Keep the existing files in place (README, script, requirements) so that the tutorial sits next to other RAG examples with a framework-agnostic name.
This commit is contained in:
committed by
GitHub
parent
49a6fd8933
commit
306397caa7
179
rag_tutorials/rag_failure_diagnostics_clinic/README.md
Normal file
179
rag_tutorials/rag_failure_diagnostics_clinic/README.md
Normal file
@@ -0,0 +1,179 @@
|
||||
# RAG Failure Diagnostics Clinic
|
||||
|
||||
A small, framework-agnostic **RAG failure diagnostics clinic**.
|
||||
|
||||
You paste a real bug description from your LLM + RAG pipeline.
|
||||
The script asks an LLM to classify the failure into one of several **reusable patterns**
|
||||
and suggests a **minimal structural fix** (not just “add more context” or “try a better model”).
|
||||
|
||||
The goal is to show a pattern-driven way to debug RAG incidents that can be
|
||||
adapted to any stack: LangChain, LlamaIndex, custom microservices, or in-house infra.
|
||||
|
||||
---
|
||||
|
||||
## What you will learn
|
||||
|
||||
By running this example, you will learn how to:
|
||||
|
||||
- Describe **real-world RAG bugs** in plain text so an LLM can reason about them.
|
||||
- Use a small library of **failure patterns** to triage incidents quickly.
|
||||
- Ask the model to propose **minimal structural changes** instead of pure prompt tweaks.
|
||||
- Call an **OpenAI-compatible API** from a small Python script.
|
||||
- Save each diagnosis into a JSON report for later analysis or post-mortems.
|
||||
|
||||
This is not a full framework.
|
||||
It is a compact **clinic app** that demonstrates a pattern you can adapt in your own stacks.
|
||||
|
||||
---
|
||||
|
||||
## Folder structure
|
||||
|
||||
This tutorial expects the following files in `rag_tutorials/rag_failure_diagnostics_clinic`:
|
||||
|
||||
- `README.md` ← this file
|
||||
- `rag_failure_diagnostics_clinic.py` ← minimal interactive CLI script
|
||||
- `requirements.txt` ← Python dependencies
|
||||
|
||||
The script is completely self-contained.
|
||||
All pattern definitions and prompts live inside this folder.
|
||||
|
||||
---
|
||||
|
||||
## Failure patterns (P01–P12)
|
||||
|
||||
The clinic uses a small, opinionated set of **12 reusable failure patterns**.
|
||||
Each bug is mapped to exactly one primary pattern, with optional secondary candidates.
|
||||
|
||||
You can modify or extend these patterns to match your own production incidents.
|
||||
|
||||
| ID | Pattern name | Typical symptom |
|
||||
| ---- | ----------------------------------------------------- | -------------------------------------------------------------- |
|
||||
| P01 | Retrieval hallucination / grounding drift | Answer confidently contradicts retrieved documents. |
|
||||
| P02 | Chunk boundary or segmentation bug | Relevant facts are split or truncated across chunks. |
|
||||
| P03 | Embedding mismatch / semantic vs vector distance | Cosine similarity does not match true relevance. |
|
||||
| P04 | Index skew or staleness | Old or missing data even though source of truth is updated. |
|
||||
| P05 | Query rewriting or router misalignment | Router sends queries to the wrong tool or dataset. |
|
||||
| P06 | Long-chain reasoning drift | Multi-step tasks gradually lose track of earlier constraints. |
|
||||
| P07 | Tool-call misuse or ungrounded tools | Tools are called with wrong arguments or without grounding. |
|
||||
| P08 | Session memory leak / missing context | Conversation loses important facts between turns or sessions. |
|
||||
| P09 | Evaluation blind spots | System passes tests but fails on real incidents. |
|
||||
| P10 | Startup ordering / dependency not ready | Services crash or 5xx during the first minutes after deploy. |
|
||||
| P11 | Config or secrets drift across environments | Works locally, breaks only in staging / prod due to settings. |
|
||||
| P12 | Multi-tenant / multi-agent interference | Requests or agents step on each other’s state or resources. |
|
||||
|
||||
The built-in examples roughly correspond to:
|
||||
|
||||
- Example 1 → retrieval hallucination / grounding drift (P01 style).
|
||||
- Example 2 → startup ordering / dependency not ready (P10 style).
|
||||
- Example 3 → config or secrets drift across environments (P11 style).
|
||||
|
||||
You are encouraged to replace these with your own incident snippets.
|
||||
|
||||
---
|
||||
|
||||
## How the clinic works
|
||||
|
||||
At a high level:
|
||||
|
||||
1. The script builds a **system prompt** that explains the 12 patterns above.
|
||||
2. You pick one of three built-in examples or paste your own RAG / LLM bug description.
|
||||
3. The model is asked to:
|
||||
- Choose a **primary pattern ID** (P01–P12).
|
||||
- Optionally choose up to **two secondary candidates**.
|
||||
- Explain the reasoning in short bullet points.
|
||||
- Propose a **minimal structural fix** (changes to retrieval, routing, eval, or infra).
|
||||
4. The full answer is printed to the console and also saved into
|
||||
`rag_failure_report.json` together with the original bug text and model name.
|
||||
|
||||
The intent is to show how a small **pattern vocabulary + prompt** can turn an LLM
|
||||
into a lightweight helper for incident triage.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.9 or newer.
|
||||
- An API key for any **OpenAI-compatible** chat completion endpoint:
|
||||
- For example, `OPENAI_API_KEY` for `https://api.openai.com/v1`.
|
||||
- Or your own proxy URL set via `OPENAI_BASE_URL`.
|
||||
- Basic familiarity with RAG pipelines, logs, and failure modes.
|
||||
|
||||
---
|
||||
|
||||
## Setup
|
||||
|
||||
From the root of the `awesome-llm-apps` repo:
|
||||
|
||||
```bash
|
||||
cd rag_tutorials/rag_failure_diagnostics_clinic
|
||||
pip install -r requirements.txt
|
||||
````
|
||||
|
||||
Minimal `requirements.txt`:
|
||||
|
||||
```text
|
||||
openai>=1.6.0
|
||||
```
|
||||
|
||||
Set your API key as an environment variable (recommended):
|
||||
|
||||
```bash
|
||||
export OPENAI_API_KEY="sk-..."
|
||||
# optional, if you use a custom endpoint
|
||||
# export OPENAI_BASE_URL="https://your-proxy.example.com/v1"
|
||||
# export OPENAI_MODEL="gpt-4o-mini"
|
||||
```
|
||||
|
||||
> Tip: If you prefer Colab, you can also copy the entire
|
||||
> `rag_failure_diagnostics_clinic.py` file into a single Colab cell and run it there.
|
||||
|
||||
---
|
||||
|
||||
## Running the clinic
|
||||
|
||||
From inside `rag_tutorials/rag_failure_diagnostics_clinic`:
|
||||
|
||||
```bash
|
||||
python rag_failure_diagnostics_clinic.py
|
||||
```
|
||||
|
||||
You will see a simple text UI:
|
||||
|
||||
* If `OPENAI_API_KEY` is not set, the script will ask for an API key.
|
||||
* You can keep the default base URL (`https://api.openai.com/v1`) and model (`gpt-4o`)
|
||||
or override them.
|
||||
* Then you choose:
|
||||
|
||||
* `1` → built-in retrieval hallucination example (P01 style).
|
||||
* `2` → startup ordering example (P10 style).
|
||||
* `3` → config / secrets drift example (P11 style).
|
||||
* `p` → paste your own bug description.
|
||||
|
||||
Each run prints a diagnosis and writes a `rag_failure_report.json` file
|
||||
containing the bug text, model settings, and assistant reply.
|
||||
|
||||
You can commit several reports into your own repo as a lightweight
|
||||
**RAG incident library**.
|
||||
|
||||
---
|
||||
|
||||
## Extending this tutorial
|
||||
|
||||
Some ideas for extending this pattern:
|
||||
|
||||
* Replace the examples with anonymized incidents from your own logs.
|
||||
* Add more patterns or split existing ones to match your stack.
|
||||
* Emit a richer JSON schema (severity, owners, suspected components).
|
||||
* Plug the reports into an evaluation dashboard or incident tracker.
|
||||
|
||||
---
|
||||
|
||||
## Optional further reading
|
||||
|
||||
If you want to see an example of an open source checklist that catalogues RAG failure modes,
|
||||
one external project you can look at is:
|
||||
|
||||
- https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md
|
||||
|
||||
This tutorial is independent of that project.
|
||||
The link is only for readers who want additional material.
|
||||
@@ -1,461 +0,0 @@
|
||||
# WFGY 16 Problem Map RAG Failure Clinic 🩺
|
||||
|
||||
An interactive **RAG failure clinic** that helps you debug LLM and RAG pipelines using the **WFGY 16 Problem Map**.
|
||||
You paste a real bug description, the tool classifies it into **No.1–No.16**, and suggests a **minimal structural fix**, not just a generic prompt tweak.
|
||||
|
||||
This tutorial lives under `rag_tutorials/wfgy_rag_failure_clinic` and is fully self-contained.
|
||||
All extra knowledge comes from the open source WFGY repo on GitHub.
|
||||
|
||||
---
|
||||
|
||||
## 🧠 What you will learn
|
||||
|
||||
By running this example, you will learn how to:
|
||||
|
||||
- Use a **problem taxonomy** (the WFGY 16 Problem Map) to classify LLM and RAG failures.
|
||||
- Turn that taxonomy into a **system prompt** that acts like a semantic firewall.
|
||||
- Describe **real-world RAG bugs** in plain text so an LLM can reason about them.
|
||||
- Call any **OpenAI-compatible API** (OpenAI, Nebius, your own proxy, etc.) from a small Python script.
|
||||
- Map the diagnosis back to concrete docs and checklists in the WFGY Problem Map.
|
||||
|
||||
This is not a full framework.
|
||||
It is a compact **clinic app** that demonstrates a pattern you can adapt in your own stacks.
|
||||
|
||||
---
|
||||
|
||||
## 📁 Folder structure
|
||||
|
||||
This tutorial expects the following files in `rag_tutorials/wfgy_rag_failure_clinic`:
|
||||
|
||||
- `README.md` ← this file
|
||||
- `wfgy_rag_failure_clinic.py` ← minimal interactive CLI / Colab-friendly script
|
||||
- `requirements.txt` ← Python dependencies
|
||||
|
||||
You do **not** need to copy any WFGY content into this repo.
|
||||
The script loads it directly from the public WFGY GitHub repo:
|
||||
|
||||
- WFGY main repo: [github.com/onestardao/WFGY](https://github.com/onestardao/WFGY)
|
||||
- WFGY Problem Map: [ProblemMap / README](https://github.com/onestardao/WFGY/tree/main/ProblemMap#readme)
|
||||
- TXTOS prompt file: [OS / TXTOS.txt](https://github.com/onestardao/WFGY/blob/main/OS/TXTOS.txt)
|
||||
|
||||
All WFGY assets are released under the MIT License.
|
||||
|
||||
---
|
||||
|
||||
## ✅ Prerequisites
|
||||
|
||||
- Python 3.9 or newer.
|
||||
- An API key for any **OpenAI-compatible** chat completion endpoint.
|
||||
- For example, `OPENAI_API_KEY` for the default `https://api.openai.com/v1`.
|
||||
- Or a Nebius key and base URL, or your own compatible proxy.
|
||||
- Basic familiarity with RAG pipelines, logs, and failure modes.
|
||||
|
||||
---
|
||||
|
||||
## ⚙️ Setup
|
||||
|
||||
From the root of the `awesome-llm-apps` repo:
|
||||
|
||||
```bash
|
||||
cd rag_tutorials/wfgy_rag_failure_clinic
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
Minimal `requirements.txt`:
|
||||
|
||||
```text
|
||||
openai>=1.6.0
|
||||
requests>=2.31.0
|
||||
```
|
||||
|
||||
Set your API key as an environment variable (recommended):
|
||||
|
||||
```bash
|
||||
export OPENAI_API_KEY="sk-..."
|
||||
# optional, if you use a custom endpoint
|
||||
# export OPENAI_BASE_URL="https://your-proxy.example.com/v1"
|
||||
```
|
||||
|
||||
> Tip: If you prefer Colab, you can also copy the entire `wfgy_rag_failure_clinic.py` file into a single Colab cell and run it there. The script is Colab-friendly out of the box.
|
||||
|
||||
---
|
||||
|
||||
## 🧩 WFGY 16 Problem Map reference
|
||||
|
||||
The **WFGY 16 Problem Map** is a checklist of recurring failure modes in LLM and RAG systems.
|
||||
This clinic treats your bug report as a symptom and maps it into one of these sixteen buckets.
|
||||
|
||||
Below is a compact reference table.
|
||||
Each row links back to the corresponding page in the WFGY repo.
|
||||
|
||||
| No. | problem domain (with layer/tags) | what breaks | doc |
|
||||
| --- | -------------------------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
|
||||
| 1 | [IN] hallucination & chunk drift {OBS} | retrieval returns wrong or irrelevant content | [hallucination.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/hallucination.md) |
|
||||
| 2 | [RE] interpretation collapse {OBS} | chunk is right, logic is wrong | [retrieval-collapse.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/retrieval-collapse.md) |
|
||||
| 3 | [RE] long reasoning chains {OBS} | drifts across multi-step tasks | [context-drift.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/context-drift.md) |
|
||||
| 4 | [RE] bluffing / overconfidence | confident but unfounded answers | [bluffing.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/bluffing.md) |
|
||||
| 5 | [IN] semantic ≠ embedding {OBS} | cosine match does not equal true meaning | [embedding-vs-semantic.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/embedding-vs-semantic.md) |
|
||||
| 6 | [RE] logic collapse & recovery {OBS} | dead ends, needs controlled reset | [logic-collapse.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/logic-collapse.md) |
|
||||
| 7 | [ST] memory breaks across sessions | lost threads, no continuity | [memory-coherence.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/memory-coherence.md) |
|
||||
| 8 | [IN] debugging is a black box {OBS} | no visibility into the failure path | [retrieval-traceability.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/retrieval-traceability.md) |
|
||||
| 9 | [ST] entropy collapse {OBS} | attention melts, incoherent output | [entropy-collapse.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/entropy-collapse.md) |
|
||||
| 10 | [RE] creative freeze | flat, literal outputs | [creative-freeze.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/creative-freeze.md) |
|
||||
| 11 | [RE] symbolic collapse | abstract or logical prompts break | [symbolic-collapse.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/symbolic-collapse.md) |
|
||||
| 12 | [RE] philosophical recursion | self-reference loops, paradox traps | [philosophical-recursion.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/philosophical-recursion.md) |
|
||||
| 13 | [ST] multi-agent chaos {OBS} | agents overwrite or misalign logic | [Multi-Agent_Problems.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/Multi-Agent_Problems.md) |
|
||||
| 14 | [OP] bootstrap ordering | services fire before dependencies are ready | [bootstrap-ordering.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/bootstrap-ordering.md) |
|
||||
| 15 | [OP] deployment deadlock | circular waits in infra | [deployment-deadlock.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/deployment-deadlock.md) |
|
||||
| 16 | [OP] pre-deploy collapse {OBS} | version skew or missing secret on first call | [predeploy-collapse.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/predeploy-collapse.md) |
|
||||
|
||||
In this tutorial the three built-in examples are mapped as follows:
|
||||
|
||||
* Example 1 → **No.1** hallucination and chunk drift.
|
||||
* Example 2 → **No.14** bootstrap ordering.
|
||||
* Example 3 → **No.16** pre-deploy collapse and config drift.
|
||||
|
||||
For deeper recovery plans and checklists, open the full
|
||||
[WFGY Problem Map overview](https://github.com/onestardao/WFGY/tree/main/ProblemMap#readme).
|
||||
|
||||
---
|
||||
|
||||
## 🩻 How the clinic works
|
||||
|
||||
At a high level:
|
||||
|
||||
1. The script **downloads** two small text files from the WFGY repo:
|
||||
|
||||
* The Problem Map README (for the taxonomy).
|
||||
* The TXTOS file (for a stable prompting style).
|
||||
2. It **builds a system prompt** that:
|
||||
|
||||
* Explains the 16 Problem Map categories.
|
||||
* States rules for picking a primary diagnosis and an optional secondary.
|
||||
* Reminds the model that examples 1–3 are canonical templates.
|
||||
3. You pick one of three **ready-made bug examples** or paste your own:
|
||||
|
||||
* Retrieval hallucination around RAG context.
|
||||
* Deployment ordering / infra race around vector stores.
|
||||
* Pre-deploy secret/config drift.
|
||||
4. The model returns:
|
||||
|
||||
* A primary **Problem Map number (No.1–No.16)**.
|
||||
* An optional secondary candidate.
|
||||
* A short explanation and a proposed **minimal structural fix**.
|
||||
5. You can then open the linked Problem Map doc for a deeper walkthrough of the failure mode and mitigations.
|
||||
|
||||
The goal is not to be perfect, but to show how a **problem taxonomy + prompt** can become a lightweight debugging assistant.
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Running the clinic
|
||||
|
||||
From inside `rag_tutorials/wfgy_rag_failure_clinic`:
|
||||
|
||||
```bash
|
||||
python wfgy_rag_failure_clinic.py
|
||||
```
|
||||
|
||||
You will see a simple text UI:
|
||||
|
||||
* If `OPENAI_API_KEY` is not set, the script will ask for an API key.
|
||||
* You can keep the default base URL (`https://api.openai.com/v1`) and model (`gpt-4o`) or override them.
|
||||
* Then you choose:
|
||||
|
||||
* `1` → built-in retrieval hallucination example (No.1 style).
|
||||
* `2` → bootstrap ordering / infra race example (No.14 style).
|
||||
* `3` → pre-deploy config drift example (No.16 style).
|
||||
* `p` → paste your own bug description.
|
||||
|
||||
A truncated sample interaction:
|
||||
|
||||
```text
|
||||
$ python wfgy_rag_failure_clinic.py
|
||||
|
||||
Loaded WFGY assets. Ready to debug.
|
||||
|
||||
Choose an example or paste your own:
|
||||
[1] Example 1 - retrieval hallucination (No.1 style)
|
||||
[2] Example 2 - bootstrap ordering / infra race (No.14 style)
|
||||
[3] Example 3 - secrets / config drift (No.16 style)
|
||||
[p] Paste my own RAG / LLM bug
|
||||
Your choice: 1
|
||||
|
||||
Running diagnosis with model: gpt-4o ...
|
||||
|
||||
Primary Problem Map match: No.1 - hallucination & chunk drift
|
||||
Secondary candidate: No.8 - debugging is a black box
|
||||
|
||||
Why:
|
||||
- Retrieved chunks explicitly say only cards and PayPal are supported.
|
||||
- The answer confidently invents Bitcoin support.
|
||||
- Logs show no retrieval or vector errors, so the drift is inside the LLM step.
|
||||
|
||||
Minimal structural fix:
|
||||
- Tighten the answer contract so the model must quote and reason over retrieved snippets.
|
||||
- Add an explicit "do not invent payment methods" clause in your system prompt.
|
||||
- Log and surface all retrieval snippets next to the answer so operators can audit future failures.
|
||||
|
||||
For the full checklist, see:
|
||||
https://github.com/onestardao/WFGY/blob/main/ProblemMap/hallucination.md
|
||||
```
|
||||
|
||||
You can repeat the process for as many bugs as you want in a single run.
|
||||
|
||||
---
|
||||
|
||||
## 🧪 Minimal script (`wfgy_rag_failure_clinic.py`)
|
||||
|
||||
Below is a minimal implementation that matches the description above.
|
||||
Place this in `rag_tutorials/wfgy_rag_failure_clinic/wfgy_rag_failure_clinic.py`.
|
||||
|
||||
```python
|
||||
"""
|
||||
WFGY RAG Failure Clinic
|
||||
Minimal interactive demo for the WFGY 16 Problem Map inside awesome-llm-apps.
|
||||
"""
|
||||
|
||||
import os
|
||||
import textwrap
|
||||
from getpass import getpass
|
||||
|
||||
import requests
|
||||
from openai import OpenAI
|
||||
|
||||
PROBLEM_MAP_URL = "https://raw.githubusercontent.com/onestardao/WFGY/main/ProblemMap/README.md"
|
||||
TXTOS_URL = "https://raw.githubusercontent.com/onestardao/WFGY/main/OS/TXTOS.txt"
|
||||
WFGY_PROBLEM_MAP_HOME = "https://github.com/onestardao/WFGY/tree/main/ProblemMap"
|
||||
WFGY_REPO = "https://github.com/onestardao/WFGY"
|
||||
|
||||
|
||||
EXAMPLE_1 = """=== Example 1 — retrieval hallucination (No.1 style) ===
|
||||
|
||||
Context:
|
||||
You have a simple RAG chatbot that answers questions from a product FAQ.
|
||||
The FAQ only covers billing rules for your SaaS product and does NOT mention anything about cryptocurrency.
|
||||
|
||||
User prompt:
|
||||
"Can I pay my subscription with Bitcoin?"
|
||||
|
||||
Retrieved context (from vector store):
|
||||
- "We only accept major credit cards and PayPal."
|
||||
- "All payments are processed in USD."
|
||||
|
||||
Model answer:
|
||||
"Yes, you can pay with Bitcoin. We support several cryptocurrencies through a third-party payment gateway."
|
||||
|
||||
Logs:
|
||||
No errors. Retrieval shows the FAQ chunks above, but the model still confidently invents Bitcoin support.
|
||||
"""
|
||||
|
||||
|
||||
EXAMPLE_2 = """=== Example 2 — bootstrap ordering / infra race (No.14 style) ===
|
||||
|
||||
Context:
|
||||
You have a RAG API with three services: api-gateway, rag-worker, and vector-db (for example Qdrant or FAISS).
|
||||
In local docker compose everything works.
|
||||
|
||||
Deployment:
|
||||
In production, services are deployed on Kubernetes.
|
||||
|
||||
Symptom:
|
||||
Right after a fresh deploy, api-gateway returns 500 errors for the first few minutes.
|
||||
Logs show connection timeouts from api-gateway to vector-db.
|
||||
|
||||
After a few minutes, the errors disappear and the system behaves normally.
|
||||
You suspect a startup race between api-gateway and vector-db but are not sure how to fix it properly.
|
||||
"""
|
||||
|
||||
|
||||
EXAMPLE_3 = """=== Example 3 — secrets / config drift around first deploy (No.16 style) ===
|
||||
|
||||
Context:
|
||||
You added a new environment variable for the RAG pipeline: SECRET_RAG_KEY.
|
||||
This is required by middleware that signs outgoing requests to an internal search API.
|
||||
|
||||
Local:
|
||||
On developer machines, SECRET_RAG_KEY is defined in .env and everything works.
|
||||
|
||||
Production:
|
||||
You deployed a new version but forgot to add SECRET_RAG_KEY to the production environment.
|
||||
The first requests after deploy fail with 500 errors and "missing secret" messages in the logs.
|
||||
|
||||
After hot-patching the secret into production, the errors stop.
|
||||
However, similar "first deploy breaks because of missing config" incidents keep happening.
|
||||
"""
|
||||
|
||||
|
||||
def fetch_text(url: str) -> str:
|
||||
resp = requests.get(url, timeout=30)
|
||||
resp.raise_for_status()
|
||||
return resp.text
|
||||
|
||||
|
||||
def build_system_prompt(problem_map: str, txtos: str) -> str:
|
||||
header = """
|
||||
You are an LLM debugger that follows the WFGY 16 Problem Map.
|
||||
|
||||
Goal:
|
||||
Given a description of a bug or failure in an LLM or RAG pipeline, you must:
|
||||
- Map it to exactly one primary Problem Map number (No.1–No.16).
|
||||
- Optionally propose one secondary candidate if it is very close.
|
||||
- Explain your reasoning in plain language.
|
||||
- Propose a minimal structural fix, not just prompt tweaking.
|
||||
- When possible, point the user toward the relevant WFGY Problem Map documents.
|
||||
|
||||
You are not allowed to invent new problem categories.
|
||||
You must choose from the sixteen WFGY Problem Map entries only.
|
||||
|
||||
About the three built-in examples:
|
||||
- Example 1 is a clean retrieval hallucination pattern. It should map primarily to No.1.
|
||||
- Example 2 is a bootstrap ordering or infra race pattern. It should map primarily to No.14.
|
||||
- Example 3 is a first deploy secrets / config drift pattern. It should map primarily to No.16.
|
||||
"""
|
||||
return (
|
||||
textwrap.dedent(header).strip()
|
||||
+ "\n\n=== TXTOS excerpt ===\n"
|
||||
+ txtos[:4000]
|
||||
+ "\n\n=== Problem Map excerpt ===\n"
|
||||
+ problem_map[:4000]
|
||||
)
|
||||
|
||||
|
||||
def load_wfgy_assets() -> str:
|
||||
print("Downloading WFGY Problem Map and TXTOS prompt ...")
|
||||
problem_map_text = fetch_text(PROBLEM_MAP_URL)
|
||||
txtos_text = fetch_text(TXTOS_URL)
|
||||
system_prompt = build_system_prompt(problem_map_text, txtos_text)
|
||||
print("Loaded WFGY assets. Ready to debug.\n")
|
||||
return system_prompt
|
||||
|
||||
|
||||
def make_client_and_model():
|
||||
api_key = os.getenv("OPENAI_API_KEY")
|
||||
if not api_key:
|
||||
api_key = getpass("Enter your OpenAI-compatible API key: ").strip()
|
||||
|
||||
base_url = os.getenv("OPENAI_BASE_URL", "").strip()
|
||||
if not base_url:
|
||||
base_url = "https://api.openai.com/v1"
|
||||
|
||||
model_name = os.getenv("OPENAI_MODEL", "").strip()
|
||||
if not model_name:
|
||||
model_name = input("Model name (press Enter for gpt-4o): ").strip() or "gpt-4o"
|
||||
|
||||
client = OpenAI(api_key=api_key, base_url=base_url)
|
||||
print(f"\nUsing base URL: {base_url}")
|
||||
print(f"Using model: {model_name}\n")
|
||||
return client, model_name
|
||||
|
||||
|
||||
def choose_bug_description() -> str:
|
||||
print("Choose an example or paste your own bug description:")
|
||||
print(" [1] Example 1 — retrieval hallucination (No.1 style)")
|
||||
print(" [2] Example 2 — bootstrap ordering / infra race (No.14 style)")
|
||||
print(" [3] Example 3 — secrets / config drift (No.16 style)")
|
||||
print(" [p] Paste my own RAG / LLM bug\n")
|
||||
|
||||
choice = input("Your choice: ").strip().lower()
|
||||
print()
|
||||
|
||||
if choice == "1":
|
||||
bug = EXAMPLE_1
|
||||
print("You selected Example 1. Full bug description:\n")
|
||||
print(bug)
|
||||
print()
|
||||
return bug
|
||||
|
||||
if choice == "2":
|
||||
bug = EXAMPLE_2
|
||||
print("You selected Example 2. Full bug description:\n")
|
||||
print(bug)
|
||||
print()
|
||||
return bug
|
||||
|
||||
if choice == "3":
|
||||
bug = EXAMPLE_3
|
||||
print("You selected Example 3. Full bug description:\n")
|
||||
print(bug)
|
||||
print()
|
||||
return bug
|
||||
|
||||
print("Paste your bug description. End with an empty line.")
|
||||
lines = []
|
||||
while True:
|
||||
try:
|
||||
line = input()
|
||||
except EOFError:
|
||||
break
|
||||
if not line.strip():
|
||||
break
|
||||
lines.append(line)
|
||||
|
||||
user_bug = "\n".join(lines).strip()
|
||||
if not user_bug:
|
||||
print("No bug description detected, aborting this round.\n")
|
||||
return ""
|
||||
|
||||
print("\nYou pasted the following bug description:\n")
|
||||
print(user_bug)
|
||||
print()
|
||||
return user_bug
|
||||
|
||||
|
||||
def run_once(client: OpenAI, model_name: str, system_prompt: str) -> None:
|
||||
bug = choose_bug_description()
|
||||
if not bug:
|
||||
return
|
||||
|
||||
print("Running diagnosis ...\n")
|
||||
|
||||
completion = client.chat.completions.create(
|
||||
model=model_name,
|
||||
temperature=0.2,
|
||||
messages=[
|
||||
{"role": "system", "content": system_prompt},
|
||||
{
|
||||
"role": "user",
|
||||
"content": (
|
||||
"Here is the bug description. "
|
||||
"Follow the WFGY 16 Problem Map rules described above.\n\n"
|
||||
+ bug
|
||||
),
|
||||
},
|
||||
],
|
||||
)
|
||||
|
||||
reply = completion.choices[0].message.content or ""
|
||||
print(reply)
|
||||
print("\nFor detailed checklists, visit:")
|
||||
print(f"- Problem Map home: {WFGY_PROBLEM_MAP_HOME}")
|
||||
print(f"- Full WFGY repo: {WFGY_REPO}\n")
|
||||
|
||||
|
||||
def main():
|
||||
system_prompt = load_wfgy_assets()
|
||||
client, model_name = make_client_and_model()
|
||||
|
||||
while True:
|
||||
run_once(client, model_name, system_prompt)
|
||||
again = input("Debug another bug? (y/n): ").strip().lower()
|
||||
if again != "y":
|
||||
print("Session finished. Goodbye.")
|
||||
break
|
||||
print()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔗 Attribution
|
||||
|
||||
* WFGY project: [https://github.com/onestardao/WFGY](https://github.com/onestardao/WFGY)
|
||||
* Original Problem Map and TXTOS design by the WFGY author.
|
||||
* This tutorial is a small integration example contributed to `awesome-llm-apps`
|
||||
to demonstrate how a **failure taxonomy** can be plugged into an LLM debugging tool.
|
||||
|
||||
You are free to adapt this pattern to your own taxonomies, evaluation suites, or internal incident post-mortems.
|
||||
|
||||
Reference in New Issue
Block a user