It's important not to overinterpret the IQ-scores-by-race results; there are a bunch of standard caveats that go here that everyone's treatment of the topic needs to include. Again, just because variance in a trait is statistically associated with variance in genes _within_ a population, does _not_ mean that differences in that trait _between_ populations are _caused_ by genes: [remember the illustrations about](#heritability-caveats) sun-deprived plants and internet-deprived red-haired children. Group differences in observed tested IQs are entirely compatible with a world in which those differences are entirely due to the environment imposed by an overtly or structurally racist society. Maybe the tests are culturally biased. Maybe people with higher socioeconomic status get more opportunities to develop their intellect, and racism impedes socio-economic mobility. And so on.
The problem is, a lot of the blank-slatey environmentally-caused-differences-only hypotheses for group IQ differences start to look less compelling when you look into the details. "Maybe the tests are biased", for example, isn't an insurmountable defeater to the entire endeavor of IQ testing—it is _itself_ a falsifiable hypothesis, or can become one if you specify what you mean by "bias" in detail. One idea of what it would mean for a test to be _biased_ is if it's partially measuring something other than what it purports to be measuring: if your test measures a _combination_ of "intelligence" and "submission to the hegemonic cultural dictates of the test-maker", then individuals and groups that submit less to your cultural hegemony are going to score worse, and if you _market_ your test as unbiasedly measuring intelligence, then people who believe your marketing copy will be misled into thinking that those who don't submit are dumber than they really are. But if so, and if not all of your individual test questions are _equally_ loaded on intelligence and cultural-hegemony, then the cultural bias should _show up in the statistics_. If some questions are more "fair" and others are relatively more culture-biased, then you would expect the _order of item difficulties_ to differ by culture: the ["item characteristic curve"](/papers/baker-kim-the_item_characteristic_curve.pdf) plotting the probability of getting a biased question "right" as a function of _overall_ test score should differ by culture, with the hegemonic group finding it "easier" and others finding it "harder". Conversely, if the questions that discriminate most between differently-scoring cultural/ethnic/"racial" groups were the same as the questions that discriminate between (say) younger and older children _within_ each group, that would be the kind of statistical clue you would expect to see if the test was unbiased and the group difference was real.
It's important not to overinterpret the IQ-scores-by-race results; there are a bunch of standard caveats that go here that everyone's treatment of the topic needs to include. Again, just because variance in a trait is statistically associated with variance in genes _within_ a population, does _not_ mean that differences in that trait _between_ populations are _caused_ by genes: [remember the illustrations about](#heritability-caveats) sun-deprived plants and internet-deprived red-haired children. Group differences in observed tested IQs are entirely compatible with a world in which those differences are entirely due to the environment imposed by an overtly or structurally racist society. Maybe the tests are culturally biased. Maybe people with higher socioeconomic status get more opportunities to develop their intellect, and racism impedes socio-economic mobility. And so on.
The problem is, a lot of the blank-slatey environmentally-caused-differences-only hypotheses for group IQ differences start to look less compelling when you look into the details. "Maybe the tests are biased", for example, isn't an insurmountable defeater to the entire endeavor of IQ testing—it is _itself_ a falsifiable hypothesis, or can become one if you specify what you mean by "bias" in detail. One idea of what it would mean for a test to be _biased_ is if it's partially measuring something other than what it purports to be measuring: if your test measures a _combination_ of "intelligence" and "submission to the hegemonic cultural dictates of the test-maker", then individuals and groups that submit less to your cultural hegemony are going to score worse, and if you _market_ your test as unbiasedly measuring intelligence, then people who believe your marketing copy will be misled into thinking that those who don't submit are dumber than they really are. But if so, and if not all of your individual test questions are _equally_ loaded on intelligence and cultural-hegemony, then the cultural bias should _show up in the statistics_. If some questions are more "fair" and others are relatively more culture-biased, then you would expect the _order of item difficulties_ to differ by culture: the ["item characteristic curve"](/papers/baker-kim-the_item_characteristic_curve.pdf) plotting the probability of getting a biased question "right" as a function of _overall_ test score should differ by culture, with the hegemonic group finding it "easier" and others finding it "harder". Conversely, if the questions that discriminate most between differently-scoring cultural/ethnic/"racial" groups were the same as the questions that discriminate between (say) younger and older children _within_ each group, that would be the kind of statistical clue you would expect to see if the test was unbiased and the group difference was real.