Statistics Interactive

Probability

Conditional probability & Bayes

15 min
Learning goals
  • You can correctly distinguish conditional probabilities from their reversals.
  • You can intuitively derive Bayes' formula via a hypothetical cohort.
  • You can explain why a second confirmation test is essential for rare conditions.
P(disease | test +) – PPV
7.5 %
7.20 of 96.5 positive findings are real.
P(healthy | test −) – NPV
99.9 %
903 of 904 negative findings are real.
Frequency tree
A cohort of 1,000 people, broken down step by step.
N = 1,000CohortP(D) = 0.80 %P(¬D) = 99.2 %Diseased8.00Healthy992Sens = 90.0 %10.0 %9.0 %Spec = 91.0 %TPTest +7.20FNTest −0.80FPTest +89.3TNTest −903
2 × 2 contingency table
True state (rows) × test result (columns), at N = 1,000.
Test +Test −Σ
Diseased7.20TP0.80FN8.00
Healthy89.3FP903TN992
Σ96.59041,000
Bayes' formula with values substituted
We are looking for P(disease | +) = P(+ | disease) · P(disease) / P(+).
P(D | +) = 90.0 % · 0.80 % / [90.0 % · 0.80 % + 9.0 % · 99.2 %]
P(D | +) = 0.0072 / [0.0072 + 0.0893]
P(D | +) = 7.46 %
Dot grid: 1,000 hypothetical people
Each dot stands for one person. True positives are what we are looking for — false positives are the problem.
TP – true positive (7)FN – false negative (1)FP – false positive (89)TN – true negative (903)
What does it tell us?
  • Base-rate neglect: Even though sensitivity is 90.0 %, fewer than half of the positive findings are real. That is the effect of a rare disease — most positive results are false positives (89.3 of 96.5).
  • Raising the prevalence — say by targeted screening in a risk group — usually boosts the PPV dramatically, even though the test's quality is unchanged. This is exactly why mass screening for rare conditions is tricky (Gigerenzer & Hoffrage 1995).