Statistics Interactive

Inference

Compare groups with your own data

15 min
Learning goals
  • You can correctly compare two groups with Welch's test and interpret the results.
  • You can distinguish Cohen's d, Hedges' g, Glass's Δ, and the common language effect size.
  • You can decide from box, density, and QQ plots which procedure fits your data.
Descriptive statistics
Two-group comparison: Caffeine vs. Placebo
GroupnMSDSEMedianQ1Q3MinMax
Caffeine25253.288.431.69253.00247.00260.00239.00267.00
Placebo25285.049.611.92285.00278.00292.00269.00302.00
Diagnostics
What do the data actually look like?
Box and strip plot
Density estimate (KDE)
Normal QQ plot per group
Test statistics
The Welch row is highlighted — the others serve as comparison.
TestStatisticdfp-value95 % CI (M₁ − M₂)
Welch's t
Recommended — no equality of variances assumed
t = -12.4247.2< .001[-36.90, -26.62]
Student's t
Classical two-sample t-test with pooled variance
t = -12.4248< .001[-36.90, -26.62]
Mann-Whitney U
Rank test — robust to outliers but sensitive to shape differences
z = -6.05< .001
Yuen's t (20 %)
t on 20 % trimmed means, robust against heavy tails and heteroscedasticity
t = -9.5627.9< .001
Effect sizes
Standardised differences, independent of the data's scale.
Cohen's d
-3.51
95 % CI: [-4.40, -2.63]
large (in favour of group 2)
Hedges' g
-3.46
95 % CI: [-4.33, -2.59]
Cohen's d with a small-sample correction
Glass's Δ
-3.30
95 % CI: [-4.39, -2.22]
Uses only group 2's SD (treated as control)
P(X > Y) — common language
0.0 %
Probability that a randomly drawn observation from Caffeine exceeds one from Placebo.
What stands out?
  • Welch and Student produce very similar results here — the variances appear comparable. Welch remains the recommended default.