Step 1 — You say "build a prompt"
Architect
Asks you targeted questions in waves to understand what you need. Silently detects the prompt type, audits for gaps, and delivers a production-ready prompt with a scorecard showing what's strong and what was traded off.
"I need a prompt that gets Claude to write brand partnership reports for my creators — they need to feel data-driven but not robotic, and each one should be customised to the brand."
Library lookup
Issue Pareto check
Intent Block created
Step 2 — Diagnosis
Evaluator
Scores the prompt across 4 dimensions (1–5 each). Finds specific, numbered issues — not vague opinions. Challenges the Architect's tradeoffs if they're costing real points. Adds historical context: "this issue type appears in 27% of evaluations."
Delivers: Scores | Numbered Issues (with severity + dimension) | Recommendations | Tradeoff Challenges
"Purpose Alignment: 4.0 — Structural Integrity: 2.5 — Completeness: 3.0 — Resilience: 2.0
Issue #1 [CRITICAL]: No output format specified. Issue #2 [MAJOR]: Generic persona..."
Issue frequency context
Registry write
Issue log auto-created
What carries from Evaluate to Refine
Scores
PA 4.0, SI 2.5, C 3.0, R 2.0 — the Refiner knows exactly which dimensions need surgery and which are already strong.
Numbered Issues
#1 [CRITICAL] No output format — #2 [MAJOR] Generic persona — each tagged with severity and the dimension it's dragging down.
Recommendations
Priority-ordered fixes the Evaluator thinks will have the biggest score impact. The Refiner uses these as its starting plan.
Tradeoff Challenges
Where the Architect made a deliberate tradeoff the Evaluator disputes — the Refiner decides whether to act on or preserve each one.
Issue History
"This type has appeared in 27% of evaluations" — the Refiner knows which problems are chronic vs. one-off, and which resist fixing.
You say "refine" — all of this feeds in automatically. You don't copy anything.
▼
Step 3 — Surgery
Refiner
Reads the full Evaluation Report. Builds a plan before touching anything. Each change maps to a numbered issue. Predicts score improvements per dimension. Checks calibration history to correct for its own bias. Runs regression checks — won't break what works to fix what's broken. Signals when gains are exhausted.
Delivers: Refined Prompt | Change Log (issue-mapped) | Score Predictions | Annotated Diff
"Change #1 → Resolves Issue #1 [CRITICAL]: Added structured output format with 6 sections...
Change #2 → Resolves Issue #2 [MAJOR]: Replaced generic persona with domain-specific role...
Predicted: Overall 2.9 → 4.1 | Diminishing returns: GREEN"
Calibration bias check
Issue history lookup
Resolution data filled
Diff generated
Re-evaluate? Loop back ↑
Step 4 — Done
Accept & Archive
Accept the prompt. It's archived in the Prompt Library with scores, intent, refinement history, and tags — so next time the Architect builds something in this domain, it has a reference point.
Library stored
Session completed
Calibration filled