Inference

Compare groups with your own data

15 min

Learning goals

•You can correctly compare two groups with Welch's test and interpret the results.
•You can distinguish Cohen's d, Hedges' g, Glass's Δ, and the common language effect size.
•You can decide from box, density, and QQ plots which procedure fits your data.

Descriptive statistics

Two-group comparison: Caffeine vs. Placebo

Group	n	M	SD	SE	Median	Q1	Q3	Min	Max
Caffeine	25	253.28	8.43	1.69	253.00	247.00	260.00	239.00	267.00
Placebo	25	285.04	9.61	1.92	285.00	278.00	292.00	269.00	302.00

Diagnostics

What do the data actually look like?

Box and strip plot

Density estimate (KDE)

Normal QQ plot per group

Test statistics

The Welch row is highlighted — the others serve as comparison.

Test	Statistic	df	p-value	95 % CI (M₁ − M₂)
Welch's t Recommended — no equality of variances assumed	t = -12.42	47.2	< .001	[-36.90, -26.62]
Student's t Classical two-sample t-test with pooled variance	t = -12.42	48	< .001	[-36.90, -26.62]
Mann-Whitney U Rank test — robust to outliers but sensitive to shape differences	z = -6.05	–	< .001	–
Yuen's t (20 %) t on 20 % trimmed means, robust against heavy tails and heteroscedasticity	t = -9.56	27.9	< .001	–

Effect sizes

Standardised differences, independent of the data's scale.

Cohen's d

-3.51

95 % CI: [-4.40, -2.63]

large (in favour of group 2)

Hedges' g

-3.46

95 % CI: [-4.33, -2.59]

Cohen's d with a small-sample correction

Glass's Δ

-3.30

95 % CI: [-4.39, -2.22]

Uses only group 2's SD (treated as control)

P(X > Y) — common language

0.0 %

Probability that a randomly drawn observation from Caffeine exceeds one from Placebo.

What stands out?

→Welch and Student produce very similar results here — the variances appear comparable. Welch remains the recommended default.