Inference
Compare groups with your own data
15 min
Learning goals
- •You can correctly compare two groups with Welch's test and interpret the results.
- •You can distinguish Cohen's d, Hedges' g, Glass's Δ, and the common language effect size.
- •You can decide from box, density, and QQ plots which procedure fits your data.
Descriptive statistics
Two-group comparison: Caffeine vs. Placebo
| Group | n | M | SD | SE | Median | Q1 | Q3 | Min | Max |
|---|---|---|---|---|---|---|---|---|---|
| Caffeine | 25 | 253.28 | 8.43 | 1.69 | 253.00 | 247.00 | 260.00 | 239.00 | 267.00 |
| Placebo | 25 | 285.04 | 9.61 | 1.92 | 285.00 | 278.00 | 292.00 | 269.00 | 302.00 |
Diagnostics
What do the data actually look like?
Box and strip plot
Density estimate (KDE)
Normal QQ plot per group
Test statistics
The Welch row is highlighted — the others serve as comparison.
| Test | Statistic | df | p-value | 95 % CI (M₁ − M₂) |
|---|---|---|---|---|
Welch's t Recommended — no equality of variances assumed | t = -12.42 | 47.2 | < .001 | [-36.90, -26.62] |
Student's t Classical two-sample t-test with pooled variance | t = -12.42 | 48 | < .001 | [-36.90, -26.62] |
Mann-Whitney U Rank test — robust to outliers but sensitive to shape differences | z = -6.05 | – | < .001 | – |
Yuen's t (20 %) t on 20 % trimmed means, robust against heavy tails and heteroscedasticity | t = -9.56 | 27.9 | < .001 | – |
Effect sizes
Standardised differences, independent of the data's scale.
Cohen's d
-3.51
95 % CI: [-4.40, -2.63]
large (in favour of group 2)
Hedges' g
-3.46
95 % CI: [-4.33, -2.59]
Cohen's d with a small-sample correction
Glass's Δ
-3.30
95 % CI: [-4.39, -2.22]
Uses only group 2's SD (treated as control)
P(X > Y) — common language
0.0 %
Probability that a randomly drawn observation from Caffeine exceeds one from Placebo.
What stands out?
- →Welch and Student produce very similar results here — the variances appear comparable. Welch remains the recommended default.