def cohens_d(X, Y):
return (
- (mean(X) + mean(Y)) /
+ (mean(X) - mean(Y)) /
sqrt(
(len(X)*variance(X) + len(Y)*variance(Y)) /
(len(X) + len(Y))
measured_m = population_with_error(0, 0.5, 10000)
true_d = cohens_d(true_f, true_m)
-print(true_d) # 1.0193773432617055 — d≈1.0, as expected!
+print(true_d) # 1.0069180384313943 — d≈1.0, as expected!
naïve_d = cohens_d(measured_f, measured_m)
-print(naïve_d) # 0.8953395386313235 — deflated!
+print(naïve_d) # 0.9012430127962895 — deflated!
```
But doesn't a similar argument hold for non-error sources of variance that are "orthogonal" to the group difference? Suppose performance on some particular cognitive task can be modeled as the sum of the general intelligence factor (zero or negligible sex difference), and a special ability factor that does show sex differences.[ref]Arthur Jensen, _The g Factor_, Chapter 13: "Although no evidence was found for sex differences in the mean level of _g_ or in the variability of _g_, there is clear evidence of marked sex differences in group factors and in test specificity. Males, on average, excel on some factors; females on others. [...] But the best available evidence fails to show a sex difference in _g_."[/ref] Then, even with zero measurement error, _d_ would underestimate the difference between women and men _of the same general intelligence_—
matched_m = performance(0, 0, 0, 10000)
population_d = cohens_d(population_f, population_m)
-print(population_d) # 0.7287587808164793 — deflated!
+print(population_d) # 0.7413662423265308 — deflated!
matched_d = cohens_d(matched_f, matched_m)
-print(matched_d) # 1.018362581243161 — as you would expect
+print(matched_d) # 1.0346898918452228 — as you would expect
```