HUNTER â Root Cause, Not Guessing
The most common way engineers use AI for debugging is also the least effective: paste an error message, ask for a fix, apply the fix, see if it worked. This approach treats debugging as guessing with autocomplete. Sometimes it works. More often it leads to a succession of patches that address symptoms without reaching the root cause.
The systematic debugging skill changes the approach entirely. Evidence before hypothesis. Narrow before fix. Verify before close.
The Core Failure Mode of AI-Assisted Debugging
When you paste an error and ask Claude "what is wrong?", Claude does what language models do: it generates the most probable explanation given the text it received. If the error is common, the explanation is usually right. If the error is unusual, or if the root cause is context-specific, Claude generates a confident-sounding explanation that addresses the surface pattern, not your actual problem.
You apply the fix. The error changes form. You paste the new error. Claude generates another explanation. Four iterations later you have four patches and a codebase that is harder to understand than when you started.
The fix is to change the question. Instead of "what is wrong?", ask "what do I know, and what do I need to find out?" Evidence collection before hypothesis formation is the discipline that prevents the guessing loop.
Phase 1: Reproduce Reliably
Before any analysis, before any hypothesis, reproduce the bug under controlled conditions.
A bug you can reproduce reliably is 80% debugged. You can observe it. You can change one variable at a time. You can verify when it is fixed. A bug you cannot reproduce is almost impossible to debug — you are chasing a ghost.
Reproduction requirements:
- Minimal trigger: the smallest input or action that consistently causes the bug
- Environment specification: OS, language version, framework version, any relevant configuration
- Frequency: always, intermittent (what percentage?), only under load, only after N operations
Tell Claude the reproduction steps before asking anything else. If you cannot produce minimal steps, that investigation is part of the debugging process.
The "only when qty is 0" observation is already a clue. You found it by specifying precisely.
Phase 2: Collect Evidence
Evidence is observable fact. Hypothesis is interpretation. Collect evidence first.
Evidence to collect before asking Claude anything:
Error information:
- Exact error message (copy-paste, do not paraphrase)
- Full stack trace with line numbers
- HTTP status codes and response bodies for API errors
- Log output from around the time of the error
State information:
- What was the state of the system immediately before the error?
- What inputs led to this state?
- Did any data change recently (migration, config change, deployment)?
Change information:
- When did this bug first appear? What changed around that time?
- Run
git log --oneline -20and look for recent commits near the symptom area - Run
git bisectif the change is not obvious
Isolation information:
- Does the bug occur in a minimal reproduction without other components?
- Does it occur in staging but not local? (Environment difference)
- Does it occur for all users or specific users? (Data difference)
Present all of this to Claude at once. Not incrementally. The quality of Claude's analysis is proportional to the completeness of evidence you provide. A half-evidence conversation leads to half-correct hypotheses.
Phase 3: Form Hypotheses
With evidence collected, ask Claude for hypotheses — plural. Not "what is wrong" but "given this evidence, what are the top 3 most likely root causes, ranked by probability?"
Asking for multiple hypotheses does two things. It prevents anchoring on the first explanation (Claude will generate a confident-sounding explanation regardless of certainty — asking for three forces acknowledgment of uncertainty). It gives you a ranked list to test.
Good hypothesis prompt:
The "what evidence would confirm or rule it out" clause is the most important part. It turns Claude's analysis into a testable prediction, not just an opinion.
Phase 4: Bisect to Root Cause
Take the highest-probability hypothesis and test it. Bisection means narrowing the problem space by half with each test.
Binary search through code: If the bug is somewhere in a 200-line function, add a log at line 100 and observe whether the bug manifests before or after. Repeat. You find the exact line in ~7 iterations regardless of function length.
Binary search through commits: If the bug appeared recently, git bisect automates the search:
Binary search through configuration: If the bug is environment-specific, isolate variables. Production vs staging differs on: environment variables, database data, network topology, service versions. Change one at a time.
Tell Claude the result of each bisection test. It adjusts hypothesis ranking based on your observations. This iterative narrowing is how root causes get found — not by the first guess being right, but by systematic elimination.
Phase 5: Fix, Test, Document
Fix only the root cause. Not the symptoms you observed along the way.
Applying the fix:
- Write a failing test that reproduces the bug (this goes in your test suite permanently)
- Apply the minimal fix to address the root cause
- Run the reproduction test — it should now pass
- Run the full test suite — it should still pass
- If you are not certain the fix is right, invoke
SENTINEL
The failing test in step 1 is not optional. Without it, the same bug can reappear after a future refactor and you will have no early warning.
Documenting the pattern: After a successful fix, spend 2 minutes creating a pattern entry. Format:
This pattern is searchable in future debugging sessions. The third time someone hits a webhook timeout, they find this pattern in 10 seconds instead of debugging for 3 hours.
The Canary Test Pattern
For bugs that cannot be reliably reproduced (race conditions, time-dependent failures, load-dependent failures), use the canary test pattern.
A canary test is a test that will fail if the bug recurs, even if you cannot reproduce it on demand right now. You write it based on your hypothesis about the root cause, deploy it, and monitor.
If this test fails in CI in 6 months, you have caught a regression before it hit production. That is the value.
Knowing When to Escalate
Not every bug is a debugging problem. Some bugs are architecture problems wearing bug costumes.
Escalate to architecture review when:
- You have debugged the same area three times in the past month
- The fix requires touching more than 5 files
- The root cause is "the system was not designed for this case"
- Multiple people have tried to fix this and each fix created new bugs
Escalate to code review when:
- The fix changes behavior in ways that are hard to test
- You are not confident the fix does not have side effects
The systematic debugging skill integrates with tribunal for exactly this case. After Phase 5, if the confidence gate scores LOW or MEDIUM, code review is automatically invoked.
Full Debugging Session Template
For reference, here is the template for a complete debugging session with Claude:
Starting a debugging session with this template — even for bugs you think are simple — dramatically reduces the time to root cause. It forces you to collect evidence before asking questions, which is the entire discipline.
Key Takeaway
Effective AI-assisted debugging is evidence-first, not guess-first. The 5 phases: reproduce reliably, collect evidence, form multiple hypotheses (ranked), bisect to root cause, fix and document. Write a failing test before applying any fix — it becomes a permanent regression guard. Store the pattern after solving — it becomes a searchable asset for future debugging sessions. The canary test pattern handles non-reproducible bugs. Escalate when the problem is architectural, not just buggy.