"I mean, if that explanation actually makes you feel happier, then fine."

"Feeling happier isn't what explanations are for. Explanations are for predicting our observations.

"Emotions, too, are functional: happiness measures whether things in your life are going well or going poorly, but does not constitute things going well, much as a high reading on a thermometer measures heat as 'temperature' without itself being heat.

"If the explanation that predicts your observations makes you unhappy, then the explanation—and the unhappiness—are functioning as designed."

Book Review: Charles Murray's Human Diversity: The Biology of Gender, Race, and Class

This is a pretty good book about things we know about some ways in which people are different from each other, particularly differences in cognitive repertoires (Murray's choice of phrase for shaving nine syllables off "personality, abilities, and social behavior"). In my last book review, I mentioned that I had been thinking about broadening the topic scope of this blog, and this book review seems like an okay place to start!

Honestly, I feel like I already knew most of this stuff?—sex differences in particular are kind of my bag—but many of the details were new to me, and it's nice to have it all bundled together in a paper book with lots of citations that I can chase down later when I'm skeptical or want more details about a specific thing! The main text is littered with pleonastic constructions like "The first author was Jane Thisand-Such" (when discussing the results of a multi-author paper) or "Details are given in the note[n]", which feel clunky to read, but are so much better than the all-too-common alternative of authors not "showing their work".

In the first part of this blog post, I'm going to summarize what I learned from (or thought about, or was reminded of by) Human Diversity, but it would be kind of unhealthy for you to rely too much on tertiary blog-post summaries of secondary semi-grown-up-book literature summaries, so if these topics happen to strike your scientific curiosity, you should probably skip this post and go buy the source material—or maybe even a grown-up textbook!

The second part of this blog post is irrelevant.

Human Diversity is divided into three parts corresponding to the topics in the subtitle! (Plus another part if you want some wrapping-up commentary from Murray.) So the first part is about things we know about some ways in which female people and male people are different from each other!

The first (short) chapter is mostly about explaining Cohen's d effect sizes, which I think are solving a very important problem! When people say "Men are taller than women" you know they don't mean all men are taller than all women (because you know that they know that that's obviously not true), but that just raises the question of what they do mean. Saying they mean it "generally", "on average", or "statistically" doesn't really solve the problem, because that covers everything between-but-not-including "No difference" to "Yes, literally all women and all men". Cohen's d—the difference between two groups' means in terms of their pooled standard deviation—lets us give a quantitative answer to how much men are taller than women: I've seen reports of d ≈ 1.4–1.7 depending on the source, a lot smaller than the sex difference in murder rates (d ≈ 2.5), but much bigger than the difference in verbal skills (d ≈ 0.3, favoring women).

Once you have a quantitative effect size, then you can visualize the overlapping distributions, and the question of whether the reality of the data should be summarized in English as a "large difference" or a "small difference" becomes much less interesting, bordering on meaningless.

Murray also addresses the issue of aggregating effect sizes—something I've been meaning to get around to blogging about more exhaustively in this context of group differences (although at least, um, my favorite author on Less Wrong covered it in the purely abstract setting): small effect sizes in any single measurement (whatever "small" means) can amount to a big difference when you're considering many measurements at once. That's how people can distinguish female and male faces at 96% accuracy, even though there's no single measurement (like "eye width" or "nose height") offers that much predictive power.

Subsequent chapters address sex differences in personality, cognition, interests, and the brain. It turns out that women are more warm, empathetic, æsthetically discerning, and cooperative than men are! They're also more into the Conventional, Artistic, and Social dimensions of the Holland occupational-interests model.

You might think that this is all due to socialization, but then it's hard to explain why the same differences show up in different cultures—and why (counterintuitively) the differences seem larger in richer, more feminist countries. (Although as evolutionary anthropologist William Buckner points out in his social-media criticism of Human Diversity, W.E.I.R.D. samples from different countries aren't capturing the full range of human cultures.) You might think that the "larger differences in rich countries" result is an artifact: maybe people in less-feminist countries implicitly make within-sex comparisons when answering personality questions (e.g., "I'm competitive for a woman") whereas people in more-feminist countries use a less sexist standard of comparison, construing ratings as compared to people-in-general. Murray points out that this explanation still posits the existence of large sex differences in rich countries (while explaining away the unexpected cross-cultural difference-in-differences). Another possibility is that sexual dimorphism in general increases with wealth, including, e.g., in height and blood pressure, not just in personality. (I notice that this is consilient with the view that agriculture was a mistake that suppresses humans' natural tendencies, and that people revert to forager-like lifestyles in many ways as the riches of the industrial revolution let them afford it.)

Women are better at verbal ability and social cognition, whereas men are better at visuospatial skills. The sexes achieve similar levels of overall performance via somewhat different mental "toolkits." Murray devotes a section to a 2007 result of Johnson and Bouchard, who report that general intelligence "masks the dimensions on which [sex differences in mental abilities] lie": people's overall skill in using tools from the metaphorical mental toolbox leads to underestimates of differences in toolkits (that is, nonmetaphorically, the effect sizes of sex differences in specific mental abilities), which you want to statistically correct for. This result in particular is super gratifying to me personally, because I independently had a very similar idea a few months back—it's super validating as an amateur to find that the pros have been thinking along the same track!

The second part of the book is about some ways in which people with different ancestries are different from each other! Obviously, there are no "distinct" "races" (that would be dumb), but it turns out (as found by endeavors such as Li et al. 2008) that when you throw clustering and dimensionality-reduction algorithms at SNP data (single nucleotide polymorphisms, places in the genome where more than one allele has non-negligible frequency), you get groupings that are a pretty good match to classical or self-identified "races".

Ask the computer to assume that an individual's ancestry came from K fictive ancestral populations where K := 2, and it'll infer that sub-Saharan Africans are descended entirely from one, East Asians and some native Americans are descended entirely from the other, and everyone else is an admixture. But if you set K := 3, populations from Europe and the near East (which were construed as admixtures in the K := 2 model) split off as a new inferred population cluster. And so on.

These ancestry groupings are a "construct" in the sense that the groupings aren't "ordained by God"—the algorithm can find K groupings for your choice of K—but where it draws those category boundaries is a function of the data. The construct is doing cognitive work, concisely summarizing statistical regularities in the dataset (which is too large for humans to hold in their heads all at once): a map that reflects a territory.

Twentieth-century theorists like Fisher and Haldane and whatshisface-the-guinea-pig-guy had already figured out a lot about how evolution works (stuff like, a mutation that confers a fitness advantage of s has a probability of about 2s of sweeping to fixation), but a lot of hypotheses about recent human evolution weren't easy to test or even formulate until the genome was sequenced!

You might think that there wasn't enough time in the 2–5k generations since we came forth out of Africa for much human evolution to take place: a new mutation needs to confer an unusually large benefit to sweep to fixation that fast. But what if you didn't actually need any new mutations? Natural selection on polygenic traits can also act on "standing variation": variation already present in the population that was mostly neutral in previous environments, but is fitness-relevant to new selection pressures. The rapid response to selective breeding observed in domesticated plants and animals mostly doesn't depend on new mutations.

Another mechanism of recent human evolution is introgression: early humans interbred with our Neanderthal and Denisovan "cousins", giving our lineage the chance to "steal" all their good alleles! In contrast to new mutations, which usually die out even when they're beneficial (that 2s rule again), alleles "flowing" from another population keep getting reintroduced, giving them more chances to sweep!

Population differences are important when working with genome-wide association studies, because a model "trained on" one population won't perform as well against the "test set" of a different population. Suppose you do a big study and find a bunch of SNPs that correlate with a trait, like schizophrenia or liking opera. The frequencies of those SNPs for two populations from the same continent (like Japanese and Chinese) will hugely correlate (Pearson's r ≈ 0.97), but for more genetically-distant populations from different continents, the correlation will still be big but not huge (like r ≈ 0.8 or whatever).

What do these differences in SNP frequencies mean in practice?? We ... don't know yet. At least some population differences are fairly well-understood: I'd tell you about sickle-cell and lactase persistence, except then I would have to scream. There are some cases where we see populations independently evolve different adaptations that solve the same problem: people living on the plateaus of both Tibet and Peru have both adapted to high altitudes, but the Tibetans did it by breathing faster and the Peruvians did it with more hemoglobin!

Sorry, "the Tibetans did it with ..." is sloppy phrasing on my part; what I actually mean is that the Tibetans who weren't genetically predisposed to breathe faster were more likely to die without leaving children behind. That's how evolution works!

The third part of the book is about genetic influences on class structure! Untangling the true causes of human variation is a really hard technical philosophy problem, but behavioral geneticists have at least gotten started with their simple ACE model. It works like this: first, assume (that is, "pretend") that the genetic variation for a trait is additive (if you have the appropriate SNP, you get more of the trait), rather than exhibiting epistasis (where the effects of different loci interfere with each other) or Mendelian dominance (where the presence of just one copy of an allele (of two) determines the phenotype, and it doesn't matter whether you heterozygously have a different allele as your second version of that gene). Then we pretend that we can partition the variance in phenotypes as the sum of the "additive" genetic variance A, plus the environmental variance "common" within a family C, plus "everything else" (including measurement "error" and the not-shared-within-families "environment") E. Briefly (albeit at the risk of being cliché): nature, nurture, and noise.

Then we can estimate the sizes of the A, C, and E components by studying fraternal and identical twins. (If you hear people talking about "twin studies", this is what they mean—not case studies of identical twins raised apart, which are really cool but don't happen very often.) Both kinds of twins have the same family environment C at the same time (parents, socioeconomic status, schools, &c.), but identical twins are twice as genetically related to each other as fraternal twins, so the extent to which the identical twins are more similar is going to pretty much be because of their genes. "Pretty much" in the sense that while there are ways in which the assumptions of the model aren't quite true (assortative mating makes fraternal twins more similar in the ways their parents were already similar before mating, identical twins might get treated more similarly by "the environment" on account of their appearance), Murray assures us that the experts assure us that the quantitative effect of these deviations are probably pretty small!

Anyway, it turns out that the effect of the shared environment C for most outcomes is smaller than most people intuitively expect—actually close to zero for personality and adult intelligence specifically! Sometimes sloppy popularizers summarize this as "parenting doesn't matter" in full generality, but it depends on the trait or outcome you're measuring: for example, the shared environment component gets up to 25% for years-of-schooling ("educational attainment") and 36% for "basic interpersonal interactions." Culture obviously exists, but for underlying psychological traits, the part of the environment that matters is mostly not shared by siblings in the same family—not the part of the environment we know how to control. Thus, a lot of economic and class stratification actually ends up being along genetic lines: the nepotism of family wealth can buy opportunities and second chances, but it doesn't actually live your life for you.

It's important not to overinterpret the heritability results; there are a bunch of standard caveats that go here that everyone's treatment of the topic needs to include! Heritability is about the variance in phenotypes that can be predicted by variance in genes. This is not the same concept as "controlled by genes." To see this, notice that the trait "number of heads" has a heritability of zero because the variance is zero: all living people have exactly one head. (Siamese twins are two people.) Heritability estimates are also necessarily bound to a particular population in a particular place and time, which can face constraints shaped solely by the environment. If you plant half of a batch of seeds in the shade and half in the sun, the variance in the heights of the resulting plants will be associated with variance in genes within each group, but the difference between the groups is solely determined by the sunniness of their environments. Likewise, in a Society with a cruel caste system under which children with red hair are denied internet access, part of the heritability of intellectual achievement is going to come from alleles that code for red hair. Even though (ex hypothesi) redheads have the same inherent intellectual potential as everyone else, the heritability computation can't see into worlds that are not our own, which might have vastly different gene–environment correlations.

(I speculate that heritability calculations being so Society-bound might help make sense of the "small role of the shared environment" results that many still balk at. If the population you're studying goes to public schools—or schools at all, as contrasted to other ways of living and learning—that could suppress a lot of the variance that might otherwise occur in families.)

Old-timey geneticists used to think that they would find small number of "genes for" something, but it turns out that we live in an omnigenetic, pleiotropic world where lots and lots of SNPs each exert a tiny effect on potentially lots and lots of things. I feel like this probably shouldn't have been surprising (genes code for amino-acid sequences, variation in what proteins get made from those amino-acid sequences is going to affect high-level behaviors, but high-level behaviors involve lots of proteins in a super-complicated unpredictable way), but I guess it was.

Murray's penultimate chapter summarizes the state of a debate between a "Robert Plomin school" and an "Eric Turkheimer school" on the impact and import of polygenic scores, where we tally up all the SNPs someone has that are associated with a trait of interest.

The starry-eyed view epitomized by Plomin says that polygenic scores are super great and everyone and her dog should be excited about them: they're causal in only one direction (the trait can't cause the score) and they let us assess risks in individuals before they happen. Clinical psychology will enter a new era of "positive genomics", where we understand how to work with the underlying dimensions along which people vary (including positively), rather than focusing on treating "diagnoses" that people allegedly "have".

The curmudgeonly view epitomized by Turkheimer says that Science is about understanding the causal structure of phenomena, and that polygenic scores don't fucking tell us anything. Marital status is heritable in the same way that intelligence is heritable, not because there are "divorce genes" in any meaningful biological sense, but because of a "universal, nonspecific genetic pull on everything": on average, people with more similar genes will make more similar proteins from those similar genes, and therefore end up with more similar phenotypes that interact with the environment in a more similar way, and eventually (the causality flowing "upwards" through many hierarchical levels of organization) this shows up in the divorce statistics of a particular Society in a particular place and time. But this is opaque and banal; the real work of Science is in figuring out what all the particular gene variations actually do.

Notably, Plomin and Turkheimer aren't actually disagreeing here: it's a difference in emphasis rather than facts. Polygenic scores don't explain mechanisms—but might they end up being useful, and used, anyway? Murray's vision of social science is content to make predictions and "explain variance" while remaining ignorant of ultimate causality. (Murray compares polygenic scores to "economic indexes predicting GDP growth", which is not necessarily a reassuring analogy to those who doubt how much of GDP represents real production rather than the "exhaust heat" of zero-sum contests in an environment of manufactured scarcity and artificial demand.) Meanwhile, my cursory understanding (while kicking myself for still not having put in the hours to get much farther into Probabilistic Graphical Models: Principles and Techniques) was that you need to understand causality in order to predict what interventions will have what effects: variance in rain may be statistically "explained by" variance in mud puddles, but you can't make it rain by turning the hose on. Maybe our feeble state of knowledge is why we don't know how to find reliable large-effect environmental interventions that still yet might exist in the vastness of the space of possible interventions.

There are also some appendices at the back of the book! Appendix 1 (reproduced from, um, one of Murray's earlier books with a coauthor) explains some basic statistics concepts. Appendix 2 ("Sexual Dimorphism in Humans") goes over the prevalence of intersex conditions and gays, and then—so much for this post broadening the topic scope of this blog—transgender typology! Murray presents the Blanchard–Bailey–Lawrence–Littman view as fact, which I think is basically correct, but a more comprehensive treatment (which I concede may be too much too hope for from a mere Appendix) would have at least mentioned alternative views (Serano? Veale?), if only to explain why they're worth dismissing. (Contrast to the eight pages in the main text explaining why "But, but, epigenetics!" is worth dismissing.) Then Appendix 3 ("Sex Differences in Brain Volumes and Variance") has tables of brain-size data, and an explanation of the greater-male-variance hypothesis. Cool!

... and that's the book review that I would prefer to write. A science review of a science book, for science nerds: the kind of thing that would have no reason to draw your attention if you're not genuinely interested in Mahanalobis D effect sizes or adaptive introgression or Falconer's formulas, for their own sake, or (better) for the sake of compressing the length of the message needed to encode your observations.

But that's not why you're reading this. That's not why Murray wrote the book. That's not even why I'm writing this. We should hope—emphasis on the should—for a discipline of Actual Social Science, whose practitioners strive to report the truth, the whole truth, and nothing but the truth, with the same passionately dispassionate objectivity they might bring to the study of beetles, or algebraic topology—or that an alien superintelligence might bring to the study of humans.

We do not have a discipline of Actual Social Science. Possibly because we're not smart enough to do it, but perhaps more so because we're not smart enough to want to do it. No one has an incentive to lie about the homotopy groups of an n-sphere. If you're asking questions about homotopy groups at all, you almost certainly care about getting the right answer for the right reasons. At most, you might be biased towards believing your own conjectures in the optimistic hope of achieving eternal algebraic-topology fame and glory, like Ruth Lawrence. But nothing about algebraic topology is going to be morally threatening in a way that will leave you fearing that your ideological enemies have seized control of the publishing-houses to plant lies in the textbooks to fuck with your head, or sobbing that a malicious God created the universe as a place of evil.

Okay, maybe that was a bad example; topology in general really is the kind of mindfuck that might be the design of an adversarial agency. (Remind me to tell you about the long line, which is like the line of real numbers, except much longer.)

In any case, as soon as we start to ask questions about humans—and far more so identifiable groups of humans—we end up entering the domain of politics.

We really shouldn't. Everyone should perceive a common interest in true beliefs—maps that reflect the territory, simple theories that predict our observations—because beliefs that make accurate predictions are useful for making good decisions. That's what "beliefs" are for, evolutionary speaking: my analogues in humanity's environment of evolutionary adaptedness were better off believing that (say) the berries from some bush were good to eat if and only if the berries were actually good to eat. If my analogues unduly-optimistically thought the berries were good when they actually weren't, they'd get sick (and lose fitness), but if they unduly-pessimistically thought the berries were not good when they actually were, they'd miss out on valuable calories (and fitness).

(Okay, this story is actually somewhat complicated by the fact that evolution didn't "figure out" how to build brains that keep track of probability and utility separately: my analogues in the environment of evolutionary adaptedness might also have been better off assuming that a rustling in the bush was a tiger, even if it usually wasn't a tiger, because failing to detect actual tigers was so much more costly (in terms of fitness) than erroneously "detecting" an imaginary tiger. But let this pass.)

The problem is that, while any individual should always want true beliefs for themselves in order to navigate the world, you might want others to have false beliefs in order to trick them into mis-navigating the world in a way that benefits you. If I'm trying to sell you a used car, then—counterintuitively—I might not want you to have accurate beliefs about the car, if that would reduce the sale price or result in no deal. If our analogues in the environment of evolutionary adaptedness regularly faced structurally similar situations, and if it's expensive to maintain two sets of beliefs (the real map for ourselves, and a fake map for our victims), we might end up with a tendency not just to be lying motherfuckers who deceive others, but also to self-deceive in situations where the payoffs (in fitness) of tricking others outweighed those of being clear-sighted ourselves.

That's why we're not smart enough to want a discipline of Actual Social Science. The benefits of having a collective understanding of human behavior—a shared map that reflects the territory that we are—could be enormous, but beliefs about our own qualities, and those of socially-salient groups to which we belong (e.g., sex, race, and class) are exactly those for which we face the largest incentive to deceive and self-deceive. Counterintuitively, I might not want you to have accurate beliefs about the value of my friendship (or the disutility of my animosity), for the same reason that I might not want you to have accurate beliefs about the value of my used car. That makes it a lot harder not just to get the right answer for the reasons, but also to trust that your fellow so-called "scholars" are trying to get the right answer, rather than trying to sneak self-aggrandizing lies into the shared map in order to fuck you over. You can't just write a friendly science book for oblivious science nerds about "things we know about some ways in which people are different from each other", because almost no one is that oblivious. To write and be understood, you have to do some sort of positioning of how your work fits in to the war over the shared map.

Murray positions Human Diversity as a corrective to a "blank slate" orthodoxy that refuses to entertain any possibility of biological influences on psychological group differences. The three parts of the book are pitched not simply as "stuff we know about biologically-mediated group differences" (the oblivious-science-nerd approach that I would prefer), but as a rebuttal to "Gender Is a Social Construct", "Race Is a Social Construct", and "Class Is a Function of Privilege." At the same time, however, Murray is careful to position his work as nonthreatening: "there are no monsters in the closet," he writes, "no dread doors that we must fear opening." He likewise "state[s] explicitly that [he] reject[s] claims that groups of people, be they sexes or races or classes, can be ranked from superior to inferior [or] that differences among groups have any relevance to human worth or dignity."

I think this strategy is sympathetic but ultimately ineffective. Murray is trying to have it both ways: challenging the orthodoxy, while denying the possibility of any unfortunate implications of the orthodoxy being false. It's like ... theistic evolution: satisfactory as long as you don't think about it too hard, but among those with a high need for cognition, who know what it's like to truly believe (as I once believed), it's not going to convince anyone who hasn't already broken from the orthodoxy.

Murray concludes, "Above all, nothing we learn will threaten human equality properly understood." I strongly agree with the moral sentiment, the underlying axiology that makes this seem like a good and wise thing to say.

And yet I have been ... trained. Trained to instinctively apply my full powers of analytical rigor and skepticism to even that which is most sacred. Because my true loyalty is to the axiology—to the process underlying my current best guess as to that which is most sacred. If that which was believed to be most sacred turns out to not be entirely coherent ... then we might have some philosophical work to do, to reformulate the sacred moral ideal in a way that's actually coherent.

"Nothing we learn will threaten X properly understood." When you elide the specific assignment X := "human equality", the form of this statement is kind of suspicious, right? Why "properly understood"? It would be weird to say, "Nothing we learn will threaten the homotopy groups of an n-sphere properly understood."

This kind of claim to be non-disprovable seems like the kind of thing you would only invent if you were secretly worried about X being threatened by new discoveries, and wanted to protect your ability to backtrack and re-gerrymander your definition of X to protect what you (think that you) currently believe.

If being an oblivious science nerd isn't an option, half-measures won't suffice. I think we can do better by going meta and analyzing the functions being served by the constraints on our discourse and seeking out clever self-aware strategies for satisfying those functions without lying about everything. We mustn't fear opening the dread meta-door in front of whether there actually are dread doors that we must fear opening.

Why is the blank slate doctrine so compelling, that so many feel the need to protect it at all costs? (As I once felt the need.) It's not ... if you've read this far, I assume you will forgive me—it's not scientifically compelling. If you were studying humans the way an alien superintelligence would, trying to get the right answer for the right reasons (which can conclude conditional answers: if what humans are like depends on choices about what we teach our children, then there will still be a fact of the matter as to what choices lead to what outcomes), you wouldn't put a whole lot of prior probability on the hypothesis "Both sexes and all ancestry-groupings of humans have the same distribution of psychological predispositions; any observed differences in behavior are solely attributable to differences in their environments." Why would that be true? We know that sexual dimorphism exists. We know that reproductively isolated populations evolve different traits to adapt to their environments, like those birds with differently-shaped beaks that Darwin saw on his boat trip. We could certainly imagine that none of the relevant selection pressures on humans happened to touch the brain—but why? Wouldn't that be kind of a weird coincidence?

If the blank slate doctrine isn't scientifically compelling—it's not something you would invent while trying to build shared maps that reflect the territory—then its appeal must have something to do with some function it plays in conflicts over the shared map, where no one trusts each other to be doing Actual Social Science rather than lying to fuck everyone else over.

And that's where the blank slate doctrine absolutely shines—it's the Schelling point for preventing group conflicts! (A Schelling point is a choice that's salient as a focus for mutual expectations: what I think that you think that I think ... &c. we'll choose.) If you admit that there could be differences between groups, you open up the questions of in what exact traits and of what exact magnitudes, which people have an incentive to lie about to divert resources and power to their group by establishing unfair conventions and then misrepresenting those contingent bargaining equilibria as some "inevitable" natural order.

If you're afraid of purported answers being used as a pretext for oppression, you might hope to make the question un-askable. Can't oppress people on the basis of race if race doesn't exist! Denying the existence of sex is harder—which doesn't stop people from occasionally trying. "I realize I am writing in an LGBT era when some argue that 63 distinct genders have been identified," Murray notes at the beginning of Appendix 2. But this oblique acerbity fails to pass the Ideological Turing Test. The language of has been identified suggests an attempt at scientific taxonomy—a project, which I share with Murray, of fitting categories to describe a preexisting objective reality. But I don't think the people making 63-item typeahead select "Gender" fields for websites are thinking in such terms to begin with. The specific number 63 is ridiculous and can't exist; it might as well be, and often is, a fill-in-the-blank free text field. Despite being insanely evil (where I mean the adjective literally rather than as a generic intensifier—evil in a way that is of or related to insanity), I must acknowledge this is at least good game theory. If you don't trust taxonomists to be acting in good faith—if you think we're trying to bulldoze the territory to fit a preconceived map—then destroying the language that would be used to be build oppressive maps is a smart move.

The taboo mostly only applies to psychological trait differences, both because those are a sensitive subject, and because they're easier to motivatedly see what you want to see: whereas things like height or skin tone can be directly seen and uncontroversially measured with well-understood physical instruments (like a meterstick or digital photo pixel values), psychological assessments are much more complicated and therefore hard to detach from the eye of the beholder. (If I describe Mary as "warm, compassionate, and agreeable", the words mean something in the sense that they change what experiences you anticipate—if you believed my report, you would be surprised if Mary were to kick your dog and make fun of your nose job—but the things that they mean are a high-level statistical signal in behavior for which we don't have a simple measurement device like a meterstick to appeal to if you and I don't trust each other's character assessments of Mary.)

Notice how the "not allowing sex and race differences in psychological traits to appear on shared maps is the Schelling point for resistance to sex- and race-based oppression" actually gives us an explanation for why one might reasonably have a sense that there are dread doors that we must not open. Undermining the "everyone is Actually Equal" Schelling point could catalyze a preference cascade—a slide down the slippery slope to the the next Schelling point, which might be a lot worse than the status quo on the "amount of rape and genocide" metric, even if it does slightly better on "estimating heritability coefficients." The orthodoxy isn't just being dumb for no reason. In analogy, Galileo and Darwin weren't trying to undermine Christianity—they had much more interesting things to think about—but religious authorities were right to fear heliocentrism and evolution: if the prevailing coordination equilibrium depends on lies, then telling the truth is a threat and it is disloyal. And if the prevailing coordination equilibrium is basically good, then you can see why purported truth-tellers striking at the heart of the faith might be believed to be evil.

Murray opens the parts of the book about sex and race with acknowledgments of the injustice of historical patriarchy ("When the first wave of feminism in the United States got its start [...] women were rebelling not against mere inequality, but against near-total legal subservience to men") and racial oppression ("slavery experienced by Africans in the New World went far beyond legal constraints [...] The freedom granted by emancipation in America was only marginally better in practice and the situation improved only slowly through the first half of the twentieth century"). It feels ... defensive? (To his credit, Murray is generally pretty forthcoming about how the need to write "defensively" shaped the book, as in a sidebar in the introduction that says that he'd prefer to say a lot more about evopsych, but he chose to just focus on empirical findings in order to avoid the charge of telling just-so stories.)

But this kind of defensive half-measure satisfies no one. From the oblivious-science-nerd perspective—the view that agrees with Murray that "everyone should calm down"—you shouldn't need to genuflect to the memory of some historical injustice before you're allowed to talk about Science. But from the perspective that cares about Justice and not just Truth, an insincere gesture or a strategic concession is all the more dangerous insofar as it could function as camouflage for a nefarious hidden agenda. If your work is explicitly aimed at destroying the anti-oppression Schelling-point belief, a few hand-wringing historical interludes and bromides about human equality having no testable implications (!!) aren't going to clear you of the suspicion that you're doing it on purpose—trying to destroy the anti-oppression Schelling point in order to oppress, and not because anything that can be destroyed by the truth, should be.

And sufficient suspicion makes communication nearly impossible. (If you know someone is lying, their words mean nothing, not even as the opposite of the truth.) As far as many of Murray's detractors are concerned, it almost doesn't matter what the text of Human Diversity says, how meticulously researched of a psychology/neuroscience/genetics lit review it is. From their perspective, Murray is "hiding the ball": they're not mad about this book; they're mad about specifically chapters 13 and 14 of a book Murray coauthored twenty-five years ago. (I don't think I'm claiming to be a mind-reader here; the first 20% of The New York Times's review of Human Diversity is pretty explicit and representative.)

In 1994's The Bell Curve: Intelligence and Class Structure in American Life, Murray and coauthor Richard J. Herrnstein argued that a lot of variation in life outcomes is explained by variation in intelligence. Some people think that folk concepts of "intelligence" or being "smart" are ill-defined and therefore not a proper object of scientific study. But that hasn't stopped some psychologists from trying to construct tests purporting to measure an "intelligence quotient" (or IQ for short). It turns out that if you give people a bunch of different mental tests, the results all positively correlate with each other: people who are good at one mental task, like listening to a list of numbers and repeating them backwards ("reverse digit span"), are also good at others, like knowing what words mean ("vocabulary"). There's a lot of fancy linear algebra involved, but basically, you can visualize people's test results as a hyperellipsoid in some high-dimensional space where the dimensions are the different tests. (I rely on this "configuration space" visual metaphor so much for so many things that when I started my secret ("secret") gender blog, it felt right to put it under a .space TLD.) The longest axis of the hyperellipsoid corresponds to the "g factor" of "general" intelligence—the choice of axis that cuts through the most variance in mental abilities.

It's important not to overinterpret the g factor as some unitary essence of intelligence rather than the length of a hyperellipsoid. It seems likely that if you gave people a bunch of physical tests, they would positively correlate with each other, such that you could extract a "general factor of athleticism". (It would be really interesting if anyone's actually done this using the same methodology used to construct IQ tests!) But athleticism is going to be an very "coarse" construct for which the tails come apart: for example, world champion 100-meter sprinter Usain Bolt's best time in the 800 meters is reportedly only around 2:10 or 2:07! (For comparison, I ran a 2:08.3 in high school once!)

Anyway, so Murray and Herrnstein talk about this "intelligence" construct, and how it's heritable, and how it predicts income, school success, not being a criminal, &c., and how Society is becoming increasingly stratified by cognitive abilities, as school credentials become the ticket to the new upper class.

This should just be more social-science nerd stuff, the sort of thing that would only draw your attention if, like me, you feel bad about not being smart enough to do algebraic topology and want to console yourself by at least knowing about the Science of not being smart enough to do algebraic topology. The reason everyone and her dog is still mad at Charles Murray a quarter of a century later is Chapter 13, "Ethnic Differences in Cognitive Ability", and Chapter 14, "Ethnic Inequalities in Relation to IQ". So, apparently, different ethnic/"racial" groups have different average scores on IQ tests. Ashkenazi Jews do the best, which is why I sometimes privately joke that the fact that I'm only 85% Ashkenazi (according to 23andMe) explains my low IQ. (I got a 131 on the WISC-III at age 10, but that's pretty dumb compared to some of my robot-cult friends.) East Asians do a little better than Europeans/"whites". And—this is the part that no one is happy about—the difference between U.S. whites and U.S. blacks is about Cohen's d ≈ 1. (If two groups differ by d = 1 on some measurement that's normally distributed within each group, that means that the mean of the group with the lower average measurement is at the 16th percentile of the group with the higher average measurement, or that a uniformly-randomly selected member of the group with the higher average measurement has a probability of about 0.76 of having a higher measurement than a uniformly-randomly selected member of the group with the lower average measurement.)

Given the tendency for people to distort shared maps for political reasons, you can see why this is a hotly contentious line of research. Even if you take the test numbers at face value, racists trying to secure unjust privileges for groups that score well, have an incentive to "play up" group IQ differences in bad faith even when they shouldn't be relevant. As economist Glenn C. Loury points out in The Anatomy of Racial Inequality, cognitive abilities decline with age, and yet we don't see a moral panic about the consequences of an aging workforce, because older people are construed by the white majority as an "us"—our mothers and fathers—rather than an outgroup. Individual differences in intelligence are also presumably less politically threatening because "smart people" as a group aren't construed as a natural political coalition—although Murray's work on cognitive class stratification would seem to suggest this intuition is mistaken.

It's important not to overinterpret the IQ-scores-by-race results; there are a bunch of standard caveats that go here that everyone's treatment of the topic needs to include. Again, just because variance in a trait is statistically associated with variance in genes within a population, does not mean that differences in that trait between populations are caused by genes: remember the illustrations about sun-deprived plants and internet-deprived red-haired children. Group differences in observed tested IQs are entirely compatible with a world in which those differences are entirely due to the environment imposed by an overtly or structurally racist society. Maybe the tests are culturally biased. Maybe people with higher socioeconomic status get more opportunities to develop their intellect, and racism impedes socio-economic mobility. And so on.

The problem is, a lot of the blank-slatey environmentally-caused-differences-only hypotheses for group IQ differences start to look less compelling when you look into the details. "Maybe the tests are biased", for example, isn't an insurmountable defeater to the entire endeavor of IQ testing—it is itself a falsifiable hypothesis, or can become one if you specify what you mean by "bias" in detail. One idea of what it would mean for a test to be biased is if it's partially measuring something other than what it purports to be measuring: if your test measures a combination of "intelligence" and "submission to the hegemonic cultural dictates of the test-maker", then individuals and groups that submit less to your cultural hegemony are going to score worse, and if you market your test as unbiasedly measuring intelligence, then people who believe your marketing copy will be misled into thinking that those who don't submit are dumber than they really are. But if so, and if not all of your individual test questions are equally loaded on intelligence and cultural-hegemony, then the cultural bias should show up in the statistics. If some questions are more "fair" and others are relatively more culture-biased, then you would expect the order of item difficulties to differ by culture: the "item characteristic curve" plotting the probability of getting a biased question "right" as a function of overall test score should differ by culture, with the hegemonic group finding it "easier" and others finding it "harder". Conversely, if the questions that discriminate most between differently-scoring cultural/ethnic/"racial" groups were the same as the questions that discriminate between (say) younger and older children within each group, that would be the kind of statistical clue you would expect to see if the test was unbiased and the group difference was real.

Hypotheses that accept IQ test results as unbiased, but attribute group differences in IQ to the environment, also make statistical predictions that could be falsified. Controlling for parental socioeconomic status only cuts the black–white gap by a third. (And note, on the hereditarian model, some of the correlation between parental SES and child outcomes is due to both being causally downstream of genes.) The mathematical relationship between between-group and within-group heritability means that the conjunction of wholly-environmentally-caused group differences, and the within-group heritability, makes quantitative predictions about how much the environments of the groups differ. Skin color is actually only controlled by a small number of alleles, so if you think Society's discrimination on skin color causes IQ differences, you could maybe design a clever study that measures both overall-ancestry and skin color, and does statistics on what happens when they diverge. And so on.

In mentioning these arguments in passing, I'm not trying to provide a comprehensive lit review on the causality of group IQ differences. (That's someone else's blog.) I'm not (that?) interested in this particular topic, and without having mastered the technical literature, my assessment would be of little value. Rather, I am ... doing some context-setting for the problem I am interested in, of fixing public discourse. The reason we can't have an intellectually-honest public discussion about human biodiversity is because good people want to respect the anti-oppression Schelling point and are afraid of giving ammunition to racists and sexists in the war over the shared map. "Black people are, on average, genetically less intelligent than white people" is the kind of sentence that pretty much only racists would feel good about saying out loud, independently of its actual truth value. In a world where most speech is about manipulating shared maps for political advantage rather than getting the right answer for the right reasons, it is rational to infer that anyone who entertains such hypotheses is either motivated by racial malice, or is at least complicit with it—and that rational expectation isn't easily canceled with a pro forma "But, but, civil discourse" or "But, but, the true meaning of Equality is unfalsifiable" disclaimer.

To speak to those who aren't already oblivious science nerds—or are committed to emulating such, as it is scientifically dubious whether anyone is really that oblivious—you need to put more effort into your excuse for why you're interested in these topics. Here's mine, and it's from the heart, though it's up to the reader to judge for herself how credible I am when I say this—

I don't want to be complicit with hatred or oppression. I want to stay loyal to the underlying egalitarian–individualist axiology that makes the blank slate doctrine sound like a good idea. But I also want to understand reality, to make sense of things. I want a world that's not lying to me. Having to believe false things—or even just not being able say certain true things when they would otherwise be relevant—extracts a dire cost on our ability to make sense of the world, because you can't just censor a few forbidden hypotheses—you have to censor everything that implies them, and everything that implies them: the more adept you are at making logical connections, the more of your mind you need to excise to stay in compliance.

We can't talk about group differences, for fear that anyone arguing that differences exist is just trying to shore up oppression. But ... structural oppression and actual group differences can both exist at the same time. They're not contradicting each other! Like, the fact that men are physically stronger than women (on average, but the effect size is enormous, like d ≈ 2.6 for total muscle mass) is not unrelated to the persistence of patriarchy! (The ability to credibly threaten to physically overpower someone, gives the more powerful party a bargaining advantage, even if the threat is typically unrealized.) That doesn't mean patriarchy is good; to think so would be to commit the naturalistic fallacy of attempting to derive an ought from an is. No one would say that famine and plague are good just because they, too, are subject to scientific explanation. This is pretty obvious, really? But similarly, genetically-mediated differences in cognitive repertoires between ancestral populations are probably going to be part of the explanation for why we see the particular forms of inequality and oppression that we do, just as a brute fact of history devoid of any particular moral significance, like how part of the explanation for why European conquest of the Americas happened earlier and went smoother for the invaders than the colonization of Africa, had to do with the disease burden going the other way (Native Americans were particularly vulnerable to smallpox, but Europeans were particularly vulnerable to malaria).

Again—obviously—is does not imply ought. In deference to the historically well-justified egalitarian fear that such hypotheses will primarily be abused by bad actors to portray their own group as "superior", I suspect it's helpful to dwell on science-fictional scenarios in which the boot of history is one's own neck, if the boot does not happen to be on one's own neck in real life. If a race of lavender humans from an alternate dimension were to come through a wormhole and invade our Earth and cruelly subjugate your people, you would probably be pretty angry, and maybe join a paramilitary group aimed at overthrowing lavender supremacy and re-instantiating civil rights. The possibility of a partially-biological explanation for why the purple bastards discovered wormhole generators when we didn't (maybe they have d ≈ 1.8 on us in visuospatial skills, enabling their population to be first to "roll" a lucky genius (probably male) who could discover the wormhole field equations), would not make the conquest somehow justified.

I don't know how to build a better world, but it seems like there are quite general grounds on which we should expect that it would be helpful to be able to talk about social problems in the language of cause and effect, with the austere objectivity of an engineering discipline. If you want to build a bridge (that will actually stay up), you need to study the "the careful textbooks [that] measure [...] the load, the shock, the pressure [that] material can bear." If you want to build a just Society (that will actually stay up), you need a discipline of Actual Social Science that can publish textbooks, and to get that, you need the ability to talk about basic facts about human existence and make simple logical and statistical inferences between them.

And no one can do it! ("Well for us, if even we, even for a moment, can get free our heart, and have our lips unchained—for that which seals them hath been deep-ordained!") Individual scientists can get results in their respective narrow disciplines; Charles Murray can just barely summarize the science to a semi-popular audience without coming off as too overtly evil to modern egalitarian moral sensibilities. (At least, the smarter egalitarians? Or, maybe I'm just old.) But at least a couple aspects of reality are even worse (with respect to naïve, non-renormalized egalitarian moral sensibilities) than the ball-hiders like Murray can admit, having already blown their entire Overton budget explaining the relevant empirical findings.

Murray approvingly quotes Steven Pinker (a fellow ball-hider, though Pinker is better at it): "Equality is not the empirical claim that all groups of humans are interchangeable; it is the moral principle that individuals should not be judged or constrained by the average properties of their group."

A fine sentiment. I emphatically agree with the underlying moral intuition that makes "Individuals should not be judged by group membership" sound like a correct moral principle—one cries out at the monstrous injustice of the individual being oppressed on the basis of mere stereotypes of what other people who look like them might statistically be like.

But can I take this literally as the exact statement of a moral principle? Technically?—no! That's actually not how epistemology works! The proposed principle derives its moral force from the case of complete information: if you know for a fact that I have moral property P, then it would be monstrously unjust to treat me differently just because other people who look like me mostly don't have moral property P. But in the real world, we often—usually—don't have complete information about people, or even about ourselves.

Bayes's theorem (just a few inferential steps away from the definition of conditional probability itself, barely worthy of being called a "theorem") states that for hypothesis H and evidence E, P(H|E) = P(E|H)P(H)/P(E). This is the fundamental equation that governs all thought. When you think you see a tree, that's really just your brain computing a high value for the probability of your sensory experiences given the hypothesis that there is a tree, multiplied by the prior probability that there is a tree, as a fraction of all the possible worlds that could be generating your sensory experiences.

What goes for seeing trees, goes the same for "treating individuals as individuals": the process of getting to know someone as an individual, involves your brain exploiting the statistical relationships between what you observe, and what you're trying to learn about. If you see someone wearing an Emacs tee-shirt, you're going to assume that they probably use Emacs, and asking them about their dot-emacs file is going to seem like a better casual conversation-starter compared to the base rate of people wearing non-Emacs shirts. Not with certainty—maybe they just found the shirt in a thrift store and thought it looked cool—but the shirt shifts the probabilities implied by your decisionmaking.

The problem that Bayesian reasoning poses for naïve egalitarian moral intuitions, is that, as far as I can tell, there's no philosophically principled reason for "probabilistic update about someone's psychology on the evidence that they're wearing an Emacs shirt" to be treated fundamentally differently from "probabilistic update about someone's psychology on the evidence that she's female". These are of course different questions, but to a Bayesian reasoner (an inhuman mathematical abstraction for getting the right answer and nothing else), they're the same kind of question: the correct update to make is an empirical matter that depends on the actual distribution of psychological traits among Emacs-shirt-wearers and among women. (In the possible world where most people wear tee-shirts from the thrift store that looked cool without knowing what they mean, the "Emacs shirt → Emacs user" inference would usually be wrong.) But to a naïve egalitarian, judging someone on their expressed affinity for Emacs is good, but judging someone on their sex is bad and wrong.

I used to be a naïve egalitarian. I was very passionate about it. I was eighteen years old. I am—again—still fond of the moral sentiment, and eager to renormalize it into something that makes sense. (Some egalitarian anxieties do translate perfectly well into the Bayesian setting, as I'll explain in a moment.) But the abject horror I felt at eighteen at the mere suggestion of making generalizations about people just—doesn't make sense. It's not even that it shouldn't be practiced (it's not that my heart wasn't in the right place), but that it can't be practiced—that the people who think they're practicing it are just confused about how their own minds work.

Give people photographs of various women and men and ask them to judge how tall the people in the photos are, as Nelson et al. 1990 did, and people's guesses reflect both the photo-subjects' actual heights, but also (to a lesser degree) their sex. Unless you expect people to be perfect at assessing height from photographs (when they don't know how far away the cameraperson was standing, aren't "trigonometrically omniscient", &c.), this behavior is just correct: men really are taller than women on average, so P(true-height|apparent-height, sex) ≠ P(true-height|apparent-height) because of regression to the mean (and women and men regress to different means). But this all happens subconsciously: in the same study, when the authors tried height-matching the photographs (for every photo of a woman of a given height, there was another photo in the set of a man of the same height) and telling the participants about the height-matching and offering a cash reward to the best height-judge, more than half of the stereotyping effect remained. It would seem that people can't consciously readjust their learned priors in reaction to verbal instructions pertaining to an artificial context.

Once you understand at a technical level that probabilistic reasoning about demographic features is both epistemically justified, and implicitly implemented as part of the way your brain processes information anyway, then a moral theory that forbids this starts to look less compelling? Of course, statistical discrimination on demographic features is only epistemically justified to exactly the extent that it helps get the right answer. Renormalized-egalitarians can still be properly outraged about the monstrous tragedies where I have moral property P but I can't prove it to you, so you instead guess incorrectly that I don't just because other people who look like me mostly don't, and you don't have any better information to go on—or tragedies in which a feedback loop between predictions and social norms creates or amplifies group differences that wouldn't exist under some other social equilibrium.

Nelson et al. also found that when the people in the photographs were pictured sitting down, then judgments of height depended much more on sex than when the photo-subjects were standing. This too makes Bayesian sense: if it's harder to tell how tall an individual is when they're sitting down, you rely more on your demographic prior. In order to reduce injustice to people who are an outlier for their group, one could argue that there is a moral imperative to seek out interventions to get more fine-grained information about individuals, so that we don't need to rely on the coarse, vague information embodied in demographic stereotypes. The moral spirit of egalitarian–individualism mostly survives in our efforts to hug the query and get specific information with which to discriminate amongst individuals. (And discriminateto distinguish, to make distinctions—is the correct word.) If you care about someone's height, it is better to precisely measure it using a meterstick than to just look at them standing up, and it is better to look at them standing up than to look at them sitting down. If you care about someone's skills as potential employee, it is better to give them a work-sample test that assesses the specific skills that you're interested in, than it is to rely on a general IQ test, and it's far better to use an IQ test than to use mere stereotypes. If our means of measuring individuals aren't reliable or cheap enough, such that we still end up using prior information from immutable demographic categories, that's a problem of grave moral seriousness—but in light of the mathematical laws governing reasoning under uncertainty, it's a problem that realistically needs to be solved with better tests and better signals, not by pretending not to have a prior.

This could take the form of finer-grained stereotypes. If someone says of me, "Taylor Saotome-Westlake? Oh, he's a man, you know what they're like," I would be offended—I mean, I would if I still believed that getting offended ever helps with anything. (It never helps.) I'm not like typical men, and I don't want to be confused with them. But if someone says, "Taylor Saotome-Westlake? Oh, he's one of those IQ 130, mid-to-low Conscientiousness and Agreeableness, high Openness, left-libertarian American Jewish atheist autogynephilic male computer programmers; you know what they're like," my response is to nod and say, "Yeah, pretty much." I'm not exactly like the others, but I don't mind being confused with them.

The other place where I think Murray is hiding the ball (even from himself) is in the section on "reconstructing a moral vocabulary for discussing human differences." (I agree that this is a very important project!) Murray writes—

I think at the root [of the reluctance to discuss immutable human differences] is the new upper class's conflation of intellectual ability and the professions it enables with human worth. Few admit it, of course. But the evolving zeitgeist of the new upper class has led to a misbegotten hierarchy whereby being a surgeon is better in some sense of human worth than being an insurance salesman, being an executive in a high-tech firm is better than being a housewife, and a neighborhood of people with advanced degrees is better than a neighborhood of high-school graduates. To put it so baldly makes it obvious how senseless it is. There shouldn't be any relationship between these things and human worth.

I take strong issue with Murray's specific examples here—as an incredibly bitter autodidact, I care not at all for formal school degrees, and as my fellow nobody pseudonymous blogger Harold Lee points out, many of those stuck in the technology rat race aspire to escape to a more domestic- and community-focused life not unlike that of a housewife. But after quibbling with the specific illustrations, I think I'm just going to bite the bullet here?

Yes, intellectual ability is a component of human worth! Maybe that's putting it baldly, but I think the alternative is obviously senseless. The fact that I have the ability and motivation to (for example, among many other things I do) write this cool science–philosophy blog about my delusional paraphilia where I do things like summarize and critique the new Charles Murray book, is a big part of what makes my life valuable—both to me, and to the people who interact with me. If I were to catch COVID-19 next month and lose 40 IQ points due to oxygen-deprivation-induced brain damage and not be able to write blog posts like this one anymore, that would be extremely terrible for me—it would make my life less-worth-living. (And this kind of judgment is reflected in health and economic policymaking in the form of quality-adjusted life years.) And my friends who love me, love me not as an irreplaceably-unique-but-otherwise-featureless atom of person-ness, but because my specific array of cognitive repertoires makes me a specific person who provides a specific kind of company. There can't be such a thing as literally unconditional love, because to love someone in particular, implicitly imposes a condition: you're only committed to love those configurations of matter that constitute an implementation of your beloved, rather than someone or something else.

Murray continues—

The conflation of intellectual ability with human worth helps to explain the new upper class's insistence that inequalities of intellectual ability must be the product of environmental disadvantage. Many people with high IQs really do feel sorry for people with low IQs. If the environment is to blame, then those unfortunates can be helped, and that makes people who want to help them feel good. If genes are to blame, it makes people who want to help them feel bad. People prefer feeling good to feeling bad, so they engage in confirmation bias when it comes to the evidence about the causes of human differences.

I agree with Murray that this kind of psychology explains a lot of the resistance to hereditarian explanations. But as long as we're accusing people of motivated reasoning, I think Murray's solution is engaging in a similar kind of denial, but just putting it in a different place. The idea that people are unequal in ways that matter is legitimately too horrifying to contemplate, so liberals deny the inequality, and conservatives deny that it matters. But I think if you really understand the fact–value distinction and see that the naturalistic fallacy is, in fact, a fallacy (and not even a tempting one), that the progress of humankind has consisted of using our wits to impose our will on an indifferent universe, then the very concept of "too horrifying to contemplate" becomes a grave error. The map is not the territory: contemplating doesn't make things worse; not-contemplating that which is already there can't make things better—and can blind you to opportunities to make things better.

Recently, Richard Dawkins spurred a lot of criticism on social media for pointing out that selective breeding would work on humans (that is, succeed at increasing the value of the traits selected for in subsequent generations), for the same reasons it works on domesticated nonhuman animals—while stressing, of course, that he deplores the idea: it's just that our moral commitments can't constrain the facts. Intellectuals with the reading-comprehension skill, including Murray, leapt to defend Dawkins and concur on both points—that eugenics would work, and that it would obviously be terribly immoral. And yet no one seems to bother explaining or arguing why it would be immoral. Yes, obviously murdering and sterilizing people is bad. But if the human race is to continue and people are going to have children anyway, those children are going to be born with some distribution of genotypes. There are probably going to be human decisions that do not involve murdering and sterilizing people that would affect that distribution—perhaps involving selection of in vitro fertilized embryos. If the distribution of genotypes were to change in a way that made the next generation grow up happier, and healthier, and smarter, that would be good for those children, and it wouldn't hurt anyone else! Life is not a zero-sum game! This is pretty obvious, really? But if no one except nobody pseudonymous bloggers can even say it, how are we to start the work?

The author of the Xenosystems blog mischievously posits five stages of knowledge of human biodiversity (in analogy to the famous, albeit reportedly lacking in empirical support, five-stage Kübler-Ross model of grief), culminating in Stage 4: Depression ("Who could possibly have imagined that reality was so evil?") and Stage 5: Acceptance ("Blank slate liberalism really has been a mountain of dishonest garbage, hasn't it? Guess it's time for it to die ...").

I think I got stuck halfway between Stage 4 and 5? It can simultaneously be the case that reality is evil, and that blank slate liberalism contains a mountain of dishonest garbage. That doesn't mean the whole thing is garbage. You can't brainwash a human with random bits; they need to be specific bits with something good in them. I would still be with the program, except that the current coordination equilibrium is really not working out for me. So it is with respect for the good works enabled by the anti-oppression Schelling point belief, that I set my sights on reorganizing at the other Schelling point of just tell the goddamned truth—not in spite of the consequences, but because of the consequences of what good people can do when we're fully informed. Each of us in her own way.

Peering Through Reverent Fingers

Any evolutionary advantage must come from a feature affecting our behavior. Thus, there is no evolutionary advantage to simply having a belief about our identity. Self-identity can matter and could have mattered only if it affects behavior, in which case it is really a process of self-identification. Moreover, it is not a matter of affirming a self-identity that we possess. For a belief that needs to be affirmed is not a belief at all.

—Joseph M. Whitmeyer, "How Evolutionary Psychology Can Contribute to Group Process Research", in The Oxford Handbook of Evolution, Biology, and Society

As an atheist, I'm not really a fan of religions, but I'll give them one thing: at least their packages of delusions are stable. The experience of losing your religion is a painful one, but once you've overcome the trauma of finding out that everything you believed was a lie, the process of figuring out how to live among the still-faithful now that you are no longer one of them, is something you only have to do once; it's not like everyone will have adopted a new Jesus Two while you were off having your crisis of faith. And the first Jesus was invisible anyway; you won't be able to pray sincerely, and that does set you apart from your—the—community, but your day-to-day life will be mostly unaffected.

The progressive Zeitgeist does not even offer this respite. Getting over psychological-sex-differences denialism was painful, but after many years of study and meditation, I think I've finally come to accept the horrible truth: women and men really are psychologically different. This sets me apart from the community, but not very much. The original lie wasn't invisible exactly, but it never caused too many problems, because it's easy to doublethink around. Most of the functional use of sex categories in Society is handled by seamless subconscious reference-classing, without anyone needing to consciously, verbally reason about sex differences: no one actually makes the same predictions or decisions about women and men—that would be crazy—but since you don't have direct introspective access to what computations your brain used to cough up a prediction or decision, you can just assume that you're treating everyone equally, and only rarely does the course of ordinary events force you to acknowledge or even notice the lie.

But in the decade I had my back turned reading science books, my former quasi-religion somehow came up with new lies: now, it's not enough to believe that women and men are mentally the same, you're also supposed to accept that those categories refer to some atomic mental property that can only be known by verbal self-report. But this actually breaks the mechanism that made the first lie so harmless: the shear stress of your prediction-and-decision classifier disagreeing with the punishment signals that the intelligent social web is using to train your pronoun-selection classifier throws the previously-backgrounded existence of the former into sharp relief. You really are expected to believe in Jesus Two! And it's far more ridiculous than the first one! I'm never going to get over this!

The Reverse Murray Rule

In the notes to his Real Education, Charles Murray proposes a convention for third-person singular pronouns where the sex of the referent is unknown or irrelevant—

As always, I adhere to the Murray Rule for dealing with third-person singular pronouns, which prescribes using the gender of the author or principal author as the default, and I hope in vain that others will adopt it.

The Murray Rule is a fine illustration of the use of conventions to break the symmetry between arbitrary choices: instead of having to flip a coin every time you want to talk about a hypothetical human in the third person, you pick a convention once, and let the convention pick the pronouns—and furthermore, Murray is proposing, you can use the sex of the author as an "input" to achieve determinism without the traditional sexism of the universal generic masculine or its distaff counterpart favored by some modern academics.

But even this still leaves us with one information-theoretic bit of freedom—one binary choice not yet determined, between the Murray Rule (female authors use the generic feminine; male authors use generic masculine) and the Reverse Murray Rule (female authors use generic masculine; male authors use generic feminine).

I'll concede that the Murray Rule is a more natural Schelling point on account of grouping "like with like": the generic hypothetical person's gender matching the author's seems to require less of a particular rationale than the other way around. But I much prefer the Reverse Murray Rule on æsthetic grounds. The implicit assumption that authors regard their own sex the normal, default case feels ... chauvinistic. And kind of gay. Women and men were made for each other. It is wrong to regard the opposite sex as some irrelevant alien, rather than an alternate self. That's why I tend to reach for the generic feminine when I'm being formal enough to eschew singular they, and the real reason I write "women and men" in that order. I like to imagine my hypothetical female analogue doing the opposite—or rather, doing the same thing—using male-first orderings and the generic masculine on the same verbalized rationale and analogous motivations in her own history ... even though she doesn't, can't exist.

Don't Read the Comments??

Historically, The Scintillating But Ultimately Untrue Thought has not provided a comment section. There were two reasons for this.

First, technical limitations, downstream of technical æsthetics. There are standard out-of-the-box blogging hosts—your WordPress, your Medium, &c.—that are easy for anyone to use, at the cost of taking control away from the user, locking access to your soul away on someone else's server, or, at best, obfuscated in some database behind opaque gobs of PHP. My real-name blog (started in December 2011, when I was much less technically adept) is still running WordPress, and I'm sad about it. In contrast, this blog is produced using the Pelican static site generator from Markdown text files, versioned in Git—simple tools I understand, producing flat HTML files that Nginx can serve. When I don't like something about my theme or my plugins, I'm not at the mercy of the developers; I can just fix it myself. The lack of a database meant forgoing a comment section, but that seemed like a small loss, because—

Second, internet comment sections are garbage and I don't want to be bothered to moderate one. I thought, people who are actually interested in replying to my writing can write a longform response on their own blog (please?—I'll link back), or on Reddit when I share to /r/TheMotte; and people who want to talk to me can find my email address (checked less often than my real-name email; I regret any delays) on the About page.

So I thought, and yet—first, the same do-it-myself æsthetics that make static-site generators attractive, make me cautiously open to the idea of a comment section that I can configure and host myself, rather than being held commercially hostage by the likes of Disqus. Second, perhaps some small consolation for never being a popular writer (I'm not prolific enough, and occupying too weird of a niche), is that maybe my readership is exclusive and discerning enough for the comments section to not be garbage.

So, as an experiment—no promises or warranties—I've set up an instance of the Isso commenting engine to host a comments section at the bottom of each indivdual post page.

Don't make me regret this.

Relative Gratitude and the Great Plague of 2020

In the depths of despair over not just having lost the Category War, but having lost it harder and at higher cost than I can even yet say (having not yet applied for clearance from the victors as to how much is my story to tell), I'm actually pretty impressed with how competently my filter bubble is handling the pandemic. When the stakes of getting the right answer for the right reasons, in public is measured in the hundreds of thousands of horrible suffocation deaths, you can see the discourse usefully move forward on the timescale of days.

In the simplest epidemiology models, the main parameter of interest is called R0, the basic reproduction number: the number of further infections caused by every new infection (at the start of the epidemic, when no one is yet immune). R0 isn't just a property of the disease itself, but also of the population's behavior. If R0 is above 1, the ranks of the infected grow exponentially; if R0 is less the 1, the outbreak peters out.

So first the narrative was "flatten the curve": until a vaccine is developed, we can't stop the virus, but with social distancing, frequent handwashing, not touching your face, &c. we can at least lower R0 to slow down the course of the epidemic, making the graph of curent infections at time t flatter and wider: if fewer people are sick at the same time, then the hospital system won't be overloaded, and fewer people will die.

The thing is, the various "flatten the curve" propaganda charts illustrating the idea didn't label their axes and depicted the "hospital system capacity" horizontal line above, or at most slightly below, the peak of the flattened curve, suggesting a scenario where mitigation efforts that merely slowed down the spread of the virus through the population would be enough to avoid disaster. Turns out, when you run the numbers, that's too optimistic: at the peak of a merely mitigated epidemic, there will be many times over more people who need intensive care, than ICU beds for them to get it. These cold equations suggest a more ambitious goal of "containment": lock everything down as hard as we need to in order to get R0 below 1, and scurry to get enough testing, contract-tracing, and quarantining infrastructure in place to support gradually restarting the economy without restarting the outbreak.

The discussion goes on (is it feasible to callibrate the response that finely?—what of the economic cost? &c.)—and that's what impresses me; that's what I'm grateful for. The discussion goes on. Sure, there's lots of the usual innumeracy, cognitive biases, and sheer wishful thinking, but when there's no strategic advantage to "playing dumb"—there's no pro-virus coalition that might gain an advantage if we admit out loud that they said something true—you can see people actually engage each other with the full beauty of our weapons, and, sometimes, change their mind in response to new information. The "flatten the curve" argument isn't "false" exactly (quantitatively slowing down the outbreak will, in fact, quantitatively make the overload on hospitals less bad), but the pretty charts portraying the flattened curve safely below the hospital capacity line were substantively misleading, and it was possible for someone to spend a bounded and small amount of effort to explain, "Hey, this is substantively misleading because ..." and be heard, to the extent that the people who made one of the most popular "flatten the curve" charts published an updated version reflecting the new argument.

This level of performance is ... not to be taken for granted. Take it from me.

Cloud Vision

Google reportedly recently sent out an email to their Cloud Vision API customers, notifying them that the service will stop returning "woman" or "man" labels for people in photos. Being charitable (as one does), I can think of reasons why I might defend or support such a decision. Detecting the sex of humans in images is going to significantly less reliable than just picking out the humans in the photo, and the way the machines do sex-classification is going to depend on their training corpus, which might contain embedded cultural prejudices that Google might not want to inadvertently use their technological hegemony to reproduce and amplify. Just using a "person" label dodges the whole problem.

I think of my experience playing with FaceApp, the uniquely best piece of software in the world, which lets the user apply neural-network-powered transformations to their photos to see how their opposite-sex analogue would look! (Okay, the software actually has lots of other transformations and filters available—aging, de-aging, add makeup, add beard, lens flare, &c.—but I'm assuming those are just there for plausible deniability.) So, for example, the "Female" transformation hallucinates long hair—but hair length isn't sexually dimorphmic the way facial morphology is! At most, the "females have long hair" convention has a large basin of attraction—but the corpus of training photos were taken from a culture following that convention. Is it OK for the AI's concept of womanhood itself to reflect that? There are all sorts of deep and subtle ethical questions about "algorithmic fairness" that could be asked here!

I don't think the deep and subtle questions are being asked. The reigning ideology does not permit itself the expressive power to formulate the deep and subtle questions. "Given that a person's gender cannot be inferred by appearance," reads the email. Cannot be inferred, says Google! This is either insane, or a blatant lie told to appease the insane. Neither bodes well for the future of my civilization. (Contrast to sane versions of the concern, like, "Cannot be inferred with sufficiently high reliability", or, "Can be inferred in most cases, but we're concerned about the social implications of misclassifying edge cases.") I'm used to this shit from support groups at the queer center in Berkeley or in Portland, but I never really took it seriously—never really believed that it could be taken seriously. But Google! Aren't those guys supposed to know math?

Just ... this fucking ideology that assumes everyone has this "gender" thing that's incredibly important for everyone to respect and honor, but otherwise has no particular properties whatsoever. I can sketch out an argument for why, in theory, the ideology is memetically fit: there are at least two (and probably three or four) clusters of motivations for why some humans want to change sex; liberal-individualist Society wants to accomodate them and progressives want to use them as a designated-victim pity-pump, but the inadequacy of the existing continuum of interventions, and perhaps more so the continuity of the menu of available interventions, is such that verbal self-identification ends up being the only stable Schelling point.

But the theory doesn't help me wrap my head about how grown-ups actually believe this shit. Or at least, are too scared to be caught dead admitting out loud that they don't. This is Cultural Revolution shit! This is Lysenko-tier mindfuckery up in here!

And I don't know how to convey, to anyone who doesn't already feel it too, that I'm scared—and that I have a reason to be scared.

I believe that knowledge is useful, and that there are general algorithms—patterns of thinking and talking—that produce knowledge. You can't just get one thing wrong—every wrong answer comes from a bug in your process, and there's an infinite family of other inputs that could trigger the same bug. The calculator that says 6 + 7 = 14 isn't just going to mislead you if you use it to predict what happens when you combine a stack of ●●●●●● pennies and a stack of ●●●●●●● pennies—it's not a calculator. The function-that-it-computes is not arithmetic.

I am not particularly intelligent man. If I ever seem to be saying true and important things that almost no one else is saying, it's not because I'm unusually insightful, but because I'm unusually bad at keeping secrets. There are ... operators among us, savvy Straussian motherfuckers who know and see everything I can, and more—but who think it doesn't matter that not everybody knows.

And I guess ... I think it matters? One of the evilest reactionary bloggers mentioned the difference between a state religion that requires you to believe in the unseen, and one that requires you to disbelieve in what is seen. My thesis is that a state religion that requires you to fluidly doublethink around the implications of "Some women have penises", will also falter over something even the Straussians have to protect. But I can't prove it.

The COVID-19 news is playing hell with my neuroticism. They say you should stock up on needed prescription drugs, in case of supply-chain disruptions. I guess I'm glad that, unlike some of my friends who I am otherwise jealous of, I'm not dependent on drugs for the hormones that my body needs in order for my bones to not rot. I wish I had known tweleve years ago, that accepting that dependency in exchange for its scintillating benefits was an option for cases like mine. There's at least a consistency in this: it's not safe to depend on the supply lines of a system that didn't have the all-around competency to just tell me.

Anyway, besides the Total Culture War over the future of my neurotype tearing apart ten-year friendships and having me plotting to flee my hometown, my life is going pretty okay. I'm getting paid lots of money to sell insurance in Canada, and I have lots of things to look forward to, like the conclusion to the Tangled sequel series, or the conclusion to the Obnoxious Bad Decision Child sequel miniseries, or finishing my forthcoming review of the new Charles Murray book. (It's going to be great—a bid to broaden the topic scope of the blog to "things that only right-wing Bad Guys want to talk about, but without myself being a right-wing Bad Guy" in full generality, not just for autogynephila and the correspondence of language to reality.)

Basically, I want to live. I know that now. And it's hard to shake the feeling that the forces trying to cloud my vision don't want me to.

If in Some Smothering Dreams You Too Could Pace

[...] and this is a war, and we are soldiers.

Fighting a Total Culture War to prevent your neurotype-demographic from becoming permanent mind-slaves of the Blue Egregore is no excuse for being a jerk.

I mean, it's an explanation, but that's different from an excuse: being a jerk has consequences, and you need to take the consequences like a man.

This, then, is the mindset of a soldier (though our beautiful cutting weapons be words instead of swords): to inflict pain, to incur guilt—and yet to have only tactical regrets. Given the chance to do it all over again, you would—but solely to be more skillful in projecting rhetorical force to secure the objective, to say more clearly what needed to be said. Not to inflict less pain or incur less guilt.

Book Review: Cailin O'Connor's The Origins of Unfairness: Social Categories and Cultural Evolution

This is a super-great book about the cultural evolutionary game theory of gender roles! (And also stuff like race and religion and caste, I guess, but I'm ignoring that because I haven't gotten around to broadening the topic scope of this blog yet.) I am unreasonably excited about this book for supplying the glue of analytical rigor to a part of my world-model that had previously been held together by threads of mere handwaving! (Three years ago on this blog, I wrote, "social-role defaults are inevitably going to accrete around [sex differences]", but I didn't, and couldn't, have told you how and why in a form suitable for verification by computer simulation.)

In this blog post, I'm going to summarize what I learned from Origins of Unfairness in my own words, but if you want to be a serious intellectual who actually reads grown-up books rather than relying on some pseudonymous nobody's blog summary, you should go buy the source material!

A puzzle: every human culture has gender roles and a substantial amount of division of labor by sex. From within a particular culture, it might be tempting to "essentialize" these differences, to think that certain kinds of tasks inherently belong in the separate spheres of women or men, as ordained by the local religion's gods (or perhaps "evolution" if your local religion is pop-evopsych rather than real-evopsych). But anthropologists know that there's huge cross-cultural variation as to the details of what tasks are assigned to which sex. There are some regularities: things like big-game hunting and metalworking are always male tasks, and things like spinning, dairying, and primary child care are "women's work." But there are also a lot of differences: the task of making ropes or pottery is gendered within a culture, but different cultures end up making different assignments.

What's going on here? Why divide labor by sex when either sex is capable of doing the job? Why not let individuals choose their own destinies, independently of how their genitals are shaped?

Observe that the division and specialization of labor is a coordination problem: there are many ways to try to produce stuff, but Society is richer when people choose ways that "fit together": our tribe is more likely to survive if I hunt and you gather or you hunt and I gather, rather than if we try to both hunt (too much variance) or both gather (not enough protein). Moreover, the division of labor is a complementary coordination problem, where we want different people do different things that fit together (like hunting and gathering in a nomadic society, or cooking and cleaning in a household), in contrast to correlative coordination problems where we want people to all end up doing the same thing that fits together (like driving on the right side of the road, or meeting at noon at the information booth at Grand Central Station).

Consider a population of agents that meet in pairs and play a complementary coordination game, like ballroom dancers that need to decide who should lead and who should follow. It's kind of a pain if every single pair has to separately negotiate roles every time they meet! But if the agents come in two equally numerous types (say, "women" and "men"), then the problem is easy: either of the conventions "men lead, women follow" or "women lead, men follow" solves the problem for everyone!

Of course, "women and men dancing" is just an illustrative example as far as the theory is concerned: the "types" here are just opaque tags that separate otherwise-identical abstract agents into groups. In particular, types are not strategies. In terms of the dancing game, the strategies "lead" and "follow" can't be types: rather, the arbitrary "men" and "women" tags (which might as well be suggestively-named Lisp tokens) are a symmetry-breaking hack that lets us turn many complementary coordination games (for every pair, who should lead?) into a single correlative coordination game (for the whole population, are we using the "men lead" or the "women lead" convention?).

Nor does there need to be a central "dance caller" who specifies which convention the population should follow. If strategies that are more successful are more frequently imitated via social learning, conventions can arise from a process of cultural evolution: in a world where most men happen to lead, women learn to follow in order to have a successful dance, and the population gets swept in to the "men lead" convention. A convention's basin of attraction is the set of initial population conditions that lead to the evolution of that convention. When there are many possible equilibria with roughly-equal-sized basins of attraction, the outcome is highly "conventional": things could have easily been otherwise given different initial conditions. (And can even be said to contain more information: "more possible outcomes" and "equally-probable outcomes" are what maximize entropy.) Situations with fewer, unequally-sized basins of attraction are more "functional": the outcome is mostly determined by the game itself.

And that's where gender roles come from! In a Society facing complementary coordination problems in production, gender is the symmetry-breaker around which conventions form. And if skills need to be trained long before they get put into production, that shapes early socialization—in a Society where women do "women's work" to complement "men's work", they're raised to start practicing it as girls.

This is also where gender inequality comes from. In game theory models without types, all agents get the same payoffs in equilibrium. (Because if they didn't, then some strategy must pay better than others—which means more agents will copy it until it doesn't.)

With types, this is no longer true: the population can settle on equilibria that favor the interests of one type over another (but are better for everyone than the absence of coordination), like an "always Bach" convention in the Bach–Stravinsky game, or in the aggregation of many games that the type tags are being used for.

This is especially true if we drop the assumption that the type "tags" have no in-game significance (other than being visible for coordination) and introduce an asymmetric payoff matrix. Consider the Nash bargaining game: two agents have to decide how to divide a pie with 10 slices, but if their demands are incompatible (like when I demand 7 slices and you also demand 7 slices, but 7 + 7 = 14 is greater than 10), then the pie explodes, and no one gets any pie. If different types of agents have different fallback options, that affects their incentives in the bargaining game: if you wouldn't have anything to eat if you didn't get any pie, then you might want to make a conservative demand, like 3 slices, in order to ensure that you get some pie even if it turns out that I'm a greedy jerk who demands 7 slices. But if I have a sandwich that's as valuable to me as 2½ slices of pie, then I'm not particularly worried about you being a greedy jerk who demands 7 slices: to me, the difference between a successful 3-slice demand and failing to make a deal at all is only half a slice, which gives me an incentive to demand more, because I have less to lose than you if bargaining fails.

This kind of dynamic explains the differences in women's roles between patriarchal "plow cultures" (in which men do agriculture with plows) and non-patriarchal "hoe cultures" (in which women do horticulture with hoes): a coordination equilibrium in which Society's primary means of sustenance is considered "women's work" gives women more negotiating power as a class. (Even when individual women in a patriarchal Society have high privilege (e.g., earning power), they're still women as far as conventions are concerned.)

The path of cultural evolution is affected not only by the types' bargaining power: the relative speed of adaptation between types can matter, too! The Red Queen hypothesis describes an evolutionary advantage to a species that can evolve quickly, the better to keep up in an evolutionary arms race against parasites. (As it happens, this may have been a key factor in the evolution of sexual reproduction—the reason, along with the dynamic instability of equal-sized gametes, that "females" and "males" even exist to begin with, rather than all organisms being asexual clones.) But in bargaining-like situations, there can be a "Red King" effect in which there's an advantage in evolving slowly. Much like how visibly throwing away your steering wheel is an advantage in the game of Chicken (that precomitment forcing your opponent to swerve in response), the type that is slower to adapt to its "counterparty" type is effectively more resistant to its bargaining demands. As O'Connor puts it, "we can think of a fast-evolving species as swerving in evolutionary time."

Similarly, when a minority group (for example, women in a male-dominated workplace) interacts with a majority, a large fraction of a minority group member's interactions will be with members of the majority: the minority learns to adapt to the majority much faster than vice versa, placing the evolutionarily implicit norm negotiation on the majority's terms.

A sign of high-integrity scholarship is when the positive insights contained in a work can be appreciated independently of the author's normative agenda (if any). O'Connor, like me—at least, I hope my self-identification in this matter is still valid, although the reader will ultimately judge that for herself—writes from a position of having a glorious vision of gender equality as Something to Protect, her mighty pen wielded in the service of that ideal in an act of heroic scholarship.

But having Something to Protect is the same thing as having something in danger. This is—as mathematical sociology treatises go—a very dark book. O'Connor repeatedly emphasizes that the theory presented in the book shows how inequality can emerge and persist under very minimal conditions—with "no bias in [the] model, no stereotype threat, not much psychology in general"—in contrast to theories that present injustice as the consequence of unique malice or prejudice, rather than mathematics.

"Ultimately," she writes, "I will present a picture in which social justice is an endless battle. The forces of cultural evolution can pull populations towards inequity, and combating those forces requires constant vigilance." The book concludes, "The battle for social justice is against a hydra that grows a new head each time any one is cut off."

When I imagine an intelligent arch-reactionary reading Origins of Unfairness (perhaps twiddling his mustache during an hour of study between a 2:30 dog-kicking appointment and 4 o'clock advocacy of a Trump coup d'état), I see him nodding along thoughtfully at the lucid prose explaining the underlying game theory insights (in between cringing at the occasional Judith Butler and stereotype-threat cites). That man, in the service of callously protecting his personal power and privilege, might construe Origins as "supporting" his ideology.

"Bwah-ha-ha!" he laughs maniacally. "I already knew that feminism was doomed simply due to the nature and meaning of male and female—but I had no idea it was further doomed as a result of the cultural evolutionary game theory of complementary coordination problems! And this, from one of the corrupt leftist establishment's own scholaresses! Priceless!"

That's how you know it's a good book. The map that reflects the territory is equally useful to good people and to bad men. Good and evil—as we would define those terms—exist in the same material universe, whose exceptionless physical laws contain no provision for biologically and culturally evolved human notions of mercy or fairness. The long arc of the moral universe points, not towards justice, but towards maximum entropy—just like the arrow of time in every other universe.

A lesser scholar, flinching from this terrible truth, might have seen fit to fudge their results, to select their modeling assumptions to present a softer narrative, something that would make better propaganda for the Blue Team ...

It wouldn't have worked. I mean, it probably would have worked as propaganda, but it wouldn't have worked in the sense of my dream about the use of maps—as scholarship, a beacon through the darkness, showing us the way to start to repair the world we actually live in, and not only the appearance of it.

More Schelling


[A mediator] can influence the other players' expectations on his own initiative, in a manner that both parties cannot help mutually recognizing. When there is no apparent focal point for agreement, he can create one by his power to make a dramatic suggestion. [...]

The white line down the center of the road is a mediator, and very likely it can err substantially toward one side or the other before the disadvantaged side finds advantage in denying its authority. The principle is beautifully illustrated by the daylight-saving-time controversy; a majority that want to do everything an hour earlier just cannot organize to do it unless it gets legislative control of the clock. And when it does, a well-organized minority that opposed the change is usually quite unable to offset the change in clock time by any organized effort to change the nominal hour at which it gets up, eats, and does business.

—Thomas Schelling, Strategy of Conflict, Ch. 5, "Enforcement, Communication, and Strategic Moves"

This explains why the trans-rights fight ends up focusing on language, rather than any particular policy where "gender" is used to make a decision. "What's the harm in calling people what they really want to be called?" goes the argument. "You can still say cis woman when you want to be more specific."

It doesn't work like that. When you change the category associated with a short codeword, you're imposing on all the downstream predictions and decisions people were already using that category/word for—nor have people catalogued all those decisions in advance; they just expect to be able to think using top-20 nouns (coordination signals) that came with their native tongue, much as how they expect to be able to think using clock time without needing to compute Earth's rate of rotation relative to the fixed stars. The skew between daylight-savings time and sidereal time would have to get pretty extreme before people started changing their schedules—or, if that were somehow forbidden, to deny the clock's authority and just start using the sun.

Reply to Ozymandias on Fully Consensual Gender

With the Hopes that our World is built on
They were utterly out of touch,
They denied that the Moon could be defined to be Stilton;
They denied she identified as Dutch;
They denied that Wishes should be categorized as Horses;
They denied that a Pig could be stipulated to have Wings;
So we worshipped the Gods of Culture
Who promised these beautiful things.

—Rudyard Kipling, "The Gods of the Copybook Headings" (paraphrased)

At the end of their reply to my reply to the immortal Scott Alexander on gender categorization, friend of the blog Ozymandias makes an analogy between social gender and money.1 What constitutes money in a given social context is determined by collective agreement: money is whatever you can reliably expect everyone else to accept as payment. This isn't a circular definition (in the way that "money is whatever we agree is money" would be uninformative to an alien who didn't already have a referent for the word money), and people advocating for a different money regime (like late-19th century American bimetalists or contemporary cryptocurrency advocates) aren't making an epistemic mistake.

I really like this analogy! An important thing to note here is that while the form of money can vary widely across sociocultural contexts (from shell beads, to silver coins, to fiat paper currency, to database entries in a bank), not just any form will suffice to serve the functions of money: perishable goods like cheese can't function as a long-term store of value; non-fungible items that vary in quality in hard-to-measure ways can't function as a unit of account.2

Because of these constraints, I don't think the money/social-gender analogy can do the work Ozy seems to expect of it. They write:

Similarly, "you're a woman if you identify as a woman!" is not a definition of womanhood. It is a criterion for who should be a woman. It states that our social genders should be fully consensual: that is, if a person says "I would like to be put in the 'woman' category now," you do that. Right now, this criterion is not broadly applied: a trans person's social gender generally depends on their presentation, their secondary sexual characteristics, and how much the cis people around them are paying attention. But perhaps it would improve things if it were.

Following the money analogy, we could imagine someone arguing that our money should be fully consensual: that is, if a person says, "I would like this to be put in the 'dollar' category now," you do that. Right now, this criterion is not broadly applied ... and it's not easy to imagine how it could be applied (a prerequisite to figuring out if perhaps it would improve things if it were). Could I buy a car by offering the dealer a banana and saying, "I would like this to be put in the '$20,000 bill' category now"? What would happen to the economy if everyone did that?

Maybe the hypothetical doesn't have to be that extreme. Perhaps we should imagine someone taking Canadian $5 bills, crossing out "Canada", drawing a beard on Wilfrid Laurier, and saying "I'd like this to be considered an American $5 bill." (Exchange rate at time of writing: 1 Canadian dollar = 0.76 U.S. dollars.) Then imagine that a social norm catches on within a certain subset of Society that it's incredibly rude to question someone who says they're giving you American money, but that this standard hasn't spread to the U.S. government and financial system.

Economists have a name for this kind of situation. Gresham's Law: bad money drives out good. In contexts where custom requires that defaced Canadian dollars be regarded as equivalent to U.S. dollars, maybe everyone will smile and pretend not to notice the difference.

They will be lying. In marketplaces governed by "trans American dollars are American dollars" social norms, smart buyers will prefer to buy with defaced Canadian dollars, and smart sellers will try to find plausibly-deniable excuses to not accept them ("That'll be $5." "Here you go! A completely normal, definitely non-suspicious American $5 bill!" "Ooh, you know what, actually we just sold out"), because everyone knows3 that when it comes time to interact with the larger banking system, the two types of dollars won't be regarded as being of equal value. Never doubting the value of other people's currency may be basic human decency, but if so, the market interprets basic human decency as damage and routes around it.

Similarly, there seem to be increasingly large subsets of Society in which it's incredibly rude to question someone's stated gender. But even if everyone says "Trans women are women" and uses the right pronouns solely on the basis of self-reported self-identity with no questions asked and no one batting an eye, it's not clear that this constitutes successfully entering a "fully consensual gender" regime insofar as people following their own self-interest are likely to systematically make decisions that treat non-well-passing trans women as if they were something more like men, even if no one would dream of being so rude as to admit out loud that that's what they're doing.

And how are you going to stop them? Every freedom-to implies the lack of a freedom-from somewhere else, and vice versa: as the cliché goes, your right to swing your fist ends at my nose. "Fully consensual gender" sounds like a good idea when you phrase it like that: what kind of monster could possibly be against consent, or for non-consent?

But the word "consent" is usually used in contexts where an overwhelming asymmetry of interests makes us want to resolve conflicts in a particular direction every time: when we say that all sex should be consensual, we mean that a person's right to bodily autonomy always takes precedence over someone else's mere horniness. Even pointing out that this is (technically, like everything else) a trade-off, feels creepy.

Categorization really doesn't seem like this. If there's a conflict between one person's desire to be modeled as belonging to a particular gender and someone else's perception that the person is more accurately thought of as belonging to a different gender, then it's not clear what it would mean to resolve the conflict in the direction of "consent of the modeled" other than mind control, or at least compelled speech.

Ozy gives a list of predictions you can make about someone on the basis of social gender, as distinct from sex, apparently meant to demonstrate the usefulness of the former concept. But a lot of the individual list items seem either superficial ("Whether they wear dresses, skirts, or makeup"—surely we don't want to go for "gender as clothing", do we??), or tied to other people's perceptions of sex.4 5

Take the "How many messages they get on a dating site" item. The reason men send lots of messages to women on dating sites is because they want to date people with vaginas and female secondary sex characteristics, and maybe eventually marry them, father children with them, &c.6

Suppose one were to say to such a man, "Ah, I see you're sending lots of messages to women, by which I mean people who self-identify as women, in accordance with the utilitarian-desirable social policy of fully-consensual gender. Therefore, you should also send messages to these non-op trans women who aren't on HRT."

I think the man would reply, "How dumb do you think I am?!"7

One might respond with, "But there's a lot of cis women who you also wouldn't date. Therefore, while you're allowed to not date trans women if that's your preference, you can't say it's because they're not women."

So, I think there's actually a statistically sophisticated reply to this which I really need to elaborate on more in future posts. To be sure, our man is just relying on his intuitive perception and probably doesn't know the statistically sophisticated reply8—but it's not clear that we've given him much of a reason to trust our clever verbal arguments over his own perception.

I happily agree that fully consensual gender is a coherent position. That doesn't make it feasible. How are you going to maintain that social equilibrium without it being immediately destroyed by normal people who have eyes and don't care about clever philosophical definition-hacking mind games the way that readers of this blog do?

That's not a rhetorical question. In the case of fiat currency, the question actually has a literal answer, although I personally am not well-versed enough in economic history to tell it. Somehow, societies have evolved from a condition in which the idea of paper currency would have provoked a "How dumb do you think I am?" reaction, to the present condition where everyone and her dog accepts paper money as money without a thought—where the "somehow" probably involves the use of state violence to enforce banking regulations.

Ozy concludes—

Since it is not, properly speaking, a definition, the decision of who should be socially gendered male or female, and how many social genders we should have is not an epistemic decision. This decision can and should be made on purely utilitarian grounds.

In some sense, this is kind of unobjectionable—what kind of monster could possibly be against utility?!—but it's an incredibly vague sense. The decision of what kind of money we should have should be made on purely utilitarian grounds, but the set of possible solutions to that problem, and how well each solution performs with respect to the global utilitarian calculus, is very tightly constrained by many facts of economics and sociology.9

So too with gender. "Utilitarian grounds" does not mean, "I and some other people have an unconstrained utopian vision, and we'll be very dysphoric if you don't implement it, so the global utilitarian calculus says you should obey us." To be sure, your dysphoria is a cost under the global utilitarian calculus—but it's just one of many costs and benefits in a complex system. If someone actually wants to do a careful psychologically- and sociologically-informed analysis of how a "fully consensual gender" regime could actually be implemented in real life,10 and what impact it would have in terms of QALYs, that would be really interesting to read!

Until then, the question remains: how dumb do you think we are?!


  1. As teased at the beginning of the bulleted list in my post-Christmas cry of pain last year, I also have responses to the other arguments Ozy makes earlier in "Man Should Allocate Some More Categories". The fact that the present post focuses specifically on replying to the gender/money analogy shall not be construed to mean that I'm conceding any other points—just that I'm a ludicrously, miserably unproductive writer. (Compare the June 2018 date of Ozy's post to the December 2019 (!) date of this one.)
  2. E.g., my goat might be healthier than your goat in a way that neither of us nor any of the other local goat-herders know how to quantify.
  3. Except not everyone knows. What actually happens is that the original "U.S. dollar" concept coexists with the debased one, and savvy people who understand what's going on can arbitrage the equivocation to expropriate from those who are less savvy.
  4. The harrassment and expected-sacrifices example in particular are what radical feminists would call sex-based oppression.
  5. Friend of the blog Ray Blanchard proposed on Twitter that the term "subjective sex" might be more useful than "gender".
  6. And the fact that it's women being deluged with messages from men rather than vice versa is predicted by the evolutionary logic of Bateman's principle and parental investment theory: the sex that invests more resources per offspring will be "choosier", and the sex that invests less will compete for them. There are a few species (like the pipefish or the Eurasian dotterel) in which males are the more-investing sex, but humans aren't one them.
  7. This isn't necessarily trans-exclusionary—many such men might be happy to date trans women who were on HRT and thereby came to more closely rememble cis/natal/actual women. But that just gets us back to passing (like I was trying to say thousands of words ago), not fully consensual gender.
  8. Although I would argue that the sophisticated statistics are part of the cognitive-scientific explanation of what he perceives.
  9. For example, fiat money lets central banks exert greater control over the money supply, but can suffer disastrous hyperinflation under the wrong conditions.
  10. As I observed recently, fully consensual gender would at least have the advantage of being a Schelling point. Oh, and speaking of "real life", I happily concede that the social-engineering problem of fully consensual gender is much easier in online communities, where pesky easy-to-detect/expensive-to-change secondary sex characteristics are hidden behind the fog of net. In other words, on the internet, nobody knows you're a G.I.R.L..


(-romise, -ensation)

(epistemic status: shitposting)

Some radical feminists complain that males body-modding to become facsimiles of women is appropriation. They're obviously correct, but the claim doesn't seem strong enough to override the right to bodily autonomy. The Law in its majesty finds a precedent in the archives of copyright law: bands recording cover songs are appropriating the work of the original composer, but the composer's claim to control and be compensated for their work doesn't seem strong enough to override the band's right to artistic expression.

The solution in either case: compulsory licensing! A small tax is administered on transition services (hormone replacement therapy, facial feminization surgery, &c.), the proceeds of which are distributed equally amongst all natal females (as if they held a collective patent on the female form).

Promises I Can Keep

"I think if you show any indications of being an egg, you need to marry someone who's OK with you eventually transitioning. Not because you necessarily will want to transition, but because it's likely enough that you need to plan for it."

"I probably don't actually disagree. Keeping promises is very important. What a betrayal it would be—to take someone to have and to hold, for better or for worse, for richer or for poorer, in sickness and in health—only to throw it all away when the cost curve moves? No. But in the spirit of policy debates not appearing one-sided, I would like to register a note of sadness that we're effectively thereby saying 'eggs don't deserve love.'"

"Eggs can have love, they just add a constraint."

"I don't think you understand the seriousness of 'just' adding a constraint to a search problem that is already very constrained. Constraint: must own unicorn."

"I Want to Be the One"

This life is not to last and it awaits apotheosis
And the passerby all sipping on their Monday coffee know this
Their stumbling through their week
Contrasts the path by which I seek
A practical ambition
For a special type of girl
I want to be the one who writes the code
That writes the code
That writes the code
That ends the world

sheet music

On the Argumentative Form "Super-Proton Things Tend to Come In Varieties"

"[...] Between one and the infinite in cases such as these, there are no sensible numbers. Not only two, but any finite number, is ridiculous and can't exist."

The Gods Themselves by Isaac Asimov

Eliezer Yudkowsky Tweets (back in March), linking to a Quillette interview with Lisa Littman (positer of "rapid onset gender dysphoria"):

Everything more complicated than protons tends to come in varieties. Hydrogen, for example, has isotopes. Gender dysphoria involves more than one proton and will probably have varieties.

To be clear, I don't know much about gender dysphoria. There's an allegation that people are reluctant to speciate more than one kind of gender dysphoria. To the extent that's not a strawman, I would say only in a generic way that GD seems liable to have more than one species.

So, I actually think the moral here is wrong! (Subtly wrong, in a way that took me a day or two to notice at the time, and am blogging about now.)

It's true that "in the real world, nothing above the level of [protons] repeats itself exactly." But when we say that a psychological or medical diagnosis "comes in varieties," we're talking about distinct taxa/clusters, not the mere existence of variation due to things not being identical down to the atomic scale; otherwise, the observation that something "comes in varieties" would be trivial. And Occam's razor/minimum-message-length says that we shouldn't postulate more explanatory entities (such as categories) unless they can pay rent in better predictions.

There's a "zero–one–infinity"-like reductio ad absurdum argument to be made here. Suppose we observe some people wake up with their left arm turned into a blue tentacle. We might want to coin a term like tentacular brachitis to summarize our observations.

The one comes to us and says, "Everything more complicated than protons tends to come in varieties. Tentacular brachitis involves more than one proton and will probably have varieties."

This, in itself, doesn't tell us anything useful about what those varieties might be ... but suppose we do some more research and indeed find that patients' tentacles have a distinct cluster structure. Not only is there covariance between different tentacle features—perhaps tentacles that are a darker shade of blue also tend to be slimier—but the color–sliminess joint distribution is starkly bimodal: modeling the tentacles as coming from two distinct "dark-blue/slimy" and "light-blue/less-slimy" taxa is a better statistical fit than positing a linear darkness/sliminesss continuum. So, congratulating ourselves on a scientific job-well-done, we speciate our diagnosis into two: "Tentacular brachitis A" and "Tentacular brachitis B".

The one comes back to us and says, "Everything more complicated than protons tends to come in varieties. Tentacular brachitis A involves more than one proton and will probably have varieties."

You see the problem. We have an infinite regress: the argument that the original category will probably need to be split into subcategories, goes just as well for each of the subcategories.

So isn't "Gender dysphoria involves more than one proton[; therefore, it] will probably have varieties" a fake explanation? The phrase "gender dysphoria" was worth inventing as a shorter code for the not-vanishingly-rare observation of "humans wanting to change sex", but unless and until you have specific observations indicating that there are meaningfully different ways dysphoria can manifest, you shouldn't posit that there are "probably" multiple varieties, because in a "nearby" Everett branch where human evolution happened slightly differently, there probably aren't: brain-intersex conditions have a kind of a priori plausibility to them, but whatever weird quirk leads to autogynephilia probably wouldn't happen with every roll of the evolutionary dice if you rewound far enough, and the memeplex driving Littman's ROGD observations was invented recently.

So I think a better moral than "Things larger than protons will probably have varieties" would be "Beware fallacies of compression." The advice to be alert to the possibility that your initial category should be split into multiple subspecies is correct and important and well-taken, but the reason it's good advice is not because things are made of protons (!?!).

At this point, some readers might be thinking, "Wait a minute, M. Taylor! Didn't you notice that part about 'There's an allegation that people are reluctant to speciate more than one kind of gender dysphoria'? That's your hobbyhorse! Even if Yudkowsky doesn't know you exist, by publicly offering a general argument that there are multiple types of dysphoria, he's effectively doing your cause a favor—and here you are criticizing him for it! Isn't that disloyal and ungrateful of you?"

Great question! And the answer is: no, absolutely not. (And, though I can never speak for anyone but myself, I can only imagine that Yudkowsky would agree? Everything I do, I learned from him.) And the reason it's not disloyal and ungrateful is because the entire mindset in which arguments can constitute a political favor is a confusion. The map is not the territory; what's true is already so. You can't make something become true by arguing for it; you can only use arguments to figure out what's true.

The fact that not everybody knows this makes it especially important for me to loudly and publicly dispute bad arguments whose conclusion I think is true for other reasons. I don't want to trick people into accepting my bottom line for fake reasons! What I want is for us all to get better at anticipating our experiences. Together.

The Strategy of Stigmatization

One common reaction by the Blanchpilled to autogynephilia-truther sites—I mean, the shouty sensationalist kind run by conservatives or radical feminists that almost never use phrases like "uselessly low-dimensional subspace", not The Scintillating But Ultimately Untrue Thought—goes like this:

That is going to make the problem worse. We need to support honest autogynephiles earnestly trying to live satisfying and good lives. Don't try to shame them—we need more of them!

But what constitutes "the problem" depends on your goals, and the best response further depends on historically contingent features of the political environment.

A toy model: suppose there are three life trajectories available to AGP natal males:

(1) Stay in the closet and quietly live in shame forever,
(2) Transition but be transmedicalist/assimilationist/gatekeepy about it (think of this as the Debbie Hayton or Anne Lawrence model), or
(3) Go all-in on trans activism ("Some women have penises, get over it", &c.; the Danielle Muscato or Rachel McKinnon model).

Which trajectory is taken is going to be partially influenced by incentives.

"This is going to make the problem worse" expresses the concern that the likes of /r/itsafetish push people from (2) to (3): if the option of both acknowledging and acting on AGP is "taken off the table", then the trans-activism coalition can "offer a better deal" than quietly living in shame forever.

But from the perspective of hard-core TERFs, (2) itself is already a loss: they're trying to push people from (2) to (1). Whether that's a strategic mistake on their part depends on whether the (2)→(3) "radicalization effect" is larger than the (2)→(1) "stigmatization effect". If it is a mistake in the Current Year (because it's better to seek favorable terms of surrender rather than risk the victor's wrath when the war is already effectively lost), it might not have been in Current Year Minus Five, or Minus Ten, &c., when the coalition backing (3) was less powerful and therefore had a weaker bid.

Political Science Epigrams

If your policy is, "We don't negotiate with terrorists, but we do appease bears", then from the perspective of a third party fighting a war against the bears, you look like a productive asset being farmed by the bears, and thus, a legitimate military target.

If your behavior is optimized to respond to political threats, but not to small requests from your friends, at some point your friends start to face a strong incentive to stop being your friends and start threatening you politically, because you've made it clear from your behavior that that's all you respond to.

If you were angry at an enemy (who used to be a friend), you might throw a rock at them. But if they didn't react to the last rock, you need to patiently build a bigger rock.

Self-Identity Is a Schelling Point

Previously on The Scintillating But Ultimately Untrue Thought ("The Categories Were Made for Man to Make Predictions", "Reply on Adult Human Females"), we've considered at length the ways in which the self-identity criterion for gender (e.g., "Women are people who identify as women") fails to satisfy some of the basic desiderata for useful categories: the cognitive function of categories is to group similar things together so that our brains can make similar predictions about them under conditions of uncertainty. In order to make the case that it's useful to think and speak such that "identifying as" a gender is the same thing as being of that gender, one would need to show that those who identify as a gender form a natural cluster in configuration space—and not just a uselessly low-dimensional subspace thereof. ("Identifies as a woman" clusters with "prefers she/her pronouns", but if there's nothing else you can say about such people, then it's not clear why we care.)

Interestingly, a extension of this line of reasoning suggests an apparently novel argument in favor of the self-identity criterion—and which might go part of the way towards explaining many people's favorable attitudes towards the self-identity criterion, even if they've never formulated the argument explicitly. Let me explain.

(And please don't tell me you're surprised that I'm inventing novel arguments for the position I've spent the last twenty months of my life obsessively arguing against! Policy debates should not appear one-sided: it is by means of searching for and weighing all relevant arguments, that one computes the optimal policy, and even generally terrible positions will have some arguments supporting them. What did you take me for, some kind of partisan hack?!)

As, um, my favorite author on Less Wrong explains, another desideratum for intersubjectively useful categories is being easy for different people to coordinate on: in order to work together and think together, we don't just want to choose predictively-useful category boundaries, we also want to make the same choices.

The author gives the age of majority as an example. Presumably the right to vote should be based on relevant features of a person (in a word, "maturity"), not how many times the Earth has gone around the sun since they were born. But it wouldn't be practical for everyone to come to consensus on how to assess "maturity", whereas it is practical for everyone to come to consensus on how to subtract dates, so our shared socially-constructed category of "legal adulthood" ends up being defined in terms of a semi-arbitrary age cut-off, at the cost of mature 16-year-olds and immature 20-year-olds losing out on or gaining privileges that they should or shouldn't have (respectively).

When people need to coordinate on making the same arbitrary-on-the-merits choice, they tend to converge on an option that is (for whatever reason) unusually salient. This is the concept of a "Schelling point", after famed economist Thomas Schelling, who posed the question of where strangers should attempt to meet in New York, if they couldn't communicate to pick a rendezvous point in advance. The plurality answer turns out to be "noon at the information booth at Grand Central Station", not because of any properties that make Grand Central Station an objectively superior meeting place that you would pick even if you could communicate in advance, but just because its centrality makes it the focus of reasonable mutual expectations about what you and your partner are likely to do. Similarly, noon is salient as the midpoint of the day. There's no particular reason to meet at noon rather than 9 a.m. or 11 a.m. or 3 p.m., except that choosing 9 or 11 or 3 would seem to demand a particular reason that you expect your counterpart to be able to derive independently.

We usually expect the question of what sex (or "gender") a person is to have a canonical answer that everyone agrees on: it would be pretty confusing for bystanders if I thought Pat was a woman and said "Pat ... she" and you thought Pat was a man and said "Pat ... he."

For transgender people who consistently pass, this (ex hypothesi) isn't a problem. Unfortunately, in the absence of magical perfect sex-change technology, not all aspiring trans people pass consistently: the same person might be perceived as their developmental sex or their desired gender, depending on which observer you ask, how long the person has been on hormone replacement therapy, whether the observer knew the person before transition, the current lighting, or any number of other factors.

If, despite this, social reality continues to require the question to have a definite canonical answer, and we can't appeal to "passing" because that's too subjective and blurry, the natural Schelling point is, "Just ask the person what gender they are, and that's what they are." Even if we don't assume that people know themselves better than anyone else, people are still the focal point for reasonable mutual expectations about knowledge about themselves: if I claim to know Pat's gender better than she knows herself, and you claim to know Pat's gender better than he knows himself, then there's no more obvious way to break the symmetry except to defer the question to Pat.

(Notably, this is also the procedure you would use for non-trans people who just happen to be really-really androgynous: you're going to believe their answer to "Are you a woman or a man?" because if you could tell, then you wouldn't have asked.)

Schelling points are "sticky." If the set of possible choices is ordered, and it's possible to "move" from a currently-selected choice towards a "nearby" one, then the selected option may slide down a "slippery slope" until stopping at a Schelling point. Imagine the armies of two countries fighting over contested territory containing a river. The river is a Schelling point for the border between the two countries: unless one of the armies has a military advantage to push the battle line forward to the next Schelling point, we expect peace-treaty negotiations to settle on the river as the border. There's no particular reason that the border couldn't be drawn 2 kilometers north of the river, except that that would invite the question of, "Why 2 kilometers? Why not 1, or 3?"

The coordination problem of how to decide what "gender" a person is, can be seen as a particular case of the problem of how to decide what gender a person is in a particular context. The notion of the same person's "gender" being different in different contexts may seem strange, but again, in the absence of magical perfect sex-change technology, we might need it for some purposes: as far as the practice of medicine is concerned, for example, there's no getting around the fact that pregnant trans men are female. (Even if the doctors address the patient as "Mr.", "he", &c., they still need to draw on their mental models of the human female body to practice their craft, which presupposes a referent for the concept of "human female body.")

But here we have a slippery slope on what domains within Society should use developmental-sex categories or self-identity categories.

At one extreme, a "Sex is immutable and determined by the presence of a Y chromosome, no exceptions" regime is a stable Schelling point: if you have a lab that can do karyotypes, there would be no ambiguity on how to classify anyone with respect to the stated category system. (It would be cruel to trans people and people with complete androgen insensitivity syndrome, but it would be a Schelling point.)

At the other extreme, "Self-reported self-identity only, no exceptions" is a stable Schelling point: given the self-identity criterion of "Just ask the person what gender they are", there's no ambiguity about how to classify anyone. (This requires us to affirm the existence of "female penises, female prostates, female sperm, and female XY chromosomes", but it's a Schelling point.)

In contrast, any of a number of "compromise" systems, while potentially performing better on edge cases, suffer from ambiguity and are on that account less game-theoretically stable. It's a lot harder for Society to establish a specific convention of the form "Okay, you can have your pronouns, but you can't use your target-gender {bathroom, locker room, sports league, hospital ward, &c.} unless you {pass really well, get bottom surgery, have a gender recognition certificate, &c.}", not only because different factions will disagree on where to draw the line for each particular gendered privilege, but also because any line not drawn on a sufficiently sticky Schelling point will face constant attempts to push it up or down the slippery slope.

Terminology Proposal: "Developmental Sex"

We need a term to describe the property that cis women and trans men have in common with each other, and that cis men and trans women have in common with each other. I'm unhappy with all three of the most frequently-used alternatives.

The "mainstream" trans-rights answer to this seems to be "assigned sex at birth" or "assigned gender at birth" (hyponyms "assigned female at birth", or a.f.a.b., and "assigned male at birth", a.m.a.b.). The problem with this is that it erases the concept of biological sex. "Assigned" seems (by design?) to suggest that doctors are making an arbitrary, possibly mistaken, choice. With the possible exception of some rare intersex conditions (the context in which the term was originally coined), this isn't the case: when we say that a baby is female, we're not trying to restrict the baby's future social roles or self-conception. We're trying to use language to express the empirical observation that the baby is, in fact, female (of the sex that produces ova).

Correspondingly, trans-skeptical authors (e.g., gender-critical feminists) tend to use "biological sex." This is a lot better than "assigned", but the problem is that it seems to falsely imply that hormone replacement therapy (HRT) isn't "biological." But HRT does have a lot of real biological effects that make trans people resemble their "target" sex in a lot of ways—we don't want our terminology to erase that, either!

Other authors (e.g., the indispensable Anne Lawrence) use "natal sex", but that has the opposite problem: "natal" (of or relating to birth) could be too generous about the extent the extent to which HRT and surgeries actually change someone's sex. (Talking about the historical fact of someone's sex at birth might suggest that it's been successfully changed since.)

My proposal: "developmental sex" (in the sense of developmental biology, "the study of the physiological changes that occur within individual organisms from their conception through reaching physical maturity"). Trans men (respectively women, &c.) weren't only born female; their bodies went through the female developmental trajectory until they transitioned. Hopefully this alternative solves all the problems and will help us communicate more clearly!

Does General Intelligence Deflate Standardized Effect Sizes of Cognitive Sex Differences?

Marco del Giudice1 points out2 that in the presence of measurement error, standardized effect size measures like Cohen's d will underestimate the "true" effect size.

The effect size d tries to quantify the difference between two distributions by reporting the difference between the distributions' means in standardized units—units that have been scaled to take into account how "spread out" the data is. This gives us a common reference scale for how big a given statistical difference is. Height is measured in meters, and "Agreeableness" in the Big Five personality model is an abstract construct that doesn't even have natural units, and yet there's still a meaningful sense in which we can say that the sex difference in height (d≈1.7) is "about three times larger" than the sex difference in Agreeableness (d≈0.5).3

Cohen's d is computed as the difference in group means, divided by the square root of the pooled variance. Thus, holding actual sex differences constant, more measurement error means more variance, which means smaller values of d. Here's some toy Python code illustrating this effect:4

from math import sqrt
from statistics import mean, variance

from numpy.random import normal, seed

# seed the random number generator for reproducibility of figures in later
# comments; commment this out to run a new experiment
seed(1)  #

def cohens_d(X, Y):
    return (
        (mean(X) - mean(Y)) /
            (len(X)*variance(X) + len(Y)*variance(Y)) /
            (len(X) + len(Y))

def population_with_error(μ, ε, n):
    def trait():
        return normal(μ, 1)
    def measurement_error():
        return normal(0, ε)
    return [trait() + measurement_error() for _ in range(n)]

# trait differs by 1 standard deviation
true_f = population_with_error(1, 0, 10000)
true_m = population_with_error(0, 0, 10000)

# as above, but with 0.5 standard units measurment error
measured_f = population_with_error(1, 0.5, 10000)
measured_m = population_with_error(0, 0.5, 10000)

true_d = cohens_d(true_f, true_m)
print(true_d)  # 1.0069180384313943 — d≈1.0, as expected!

naïve_d = cohens_d(measured_f, measured_m)
print(naïve_d)  # 0.9012430127962895 — deflated!

But doesn't a similar argument hold for non-error sources of variance that are "orthogonal" to the group difference? Suppose performance on some particular cognitive task can be modeled as the sum of the general intelligence factor (zero or negligible sex difference), and a special ability factor that does show sex differences.5 Then, even with zero measurement error, d would underestimate the difference between women and men of the same general intelligence

def performance(μ_g, σ_g, s, n):
    def general_ability():
        return normal(μ_g, σ_g)
    def special_ability():
        return normal(s, 1)
    return [general_ability() + special_ability() for _ in range(n)]

# ♀ one standard deviation better than ♂ at the special factor
population_f = performance(0, 1, 1, 10000)
population_m = performance(0, 1, 0, 10000)

# ... but suppose we control/match for general intelligence
matched_f = performance(0, 0, 1, 10000)
matched_m = performance(0, 0, 0, 10000)

population_d = cohens_d(population_f, population_m)
print(population_d)  # 0.7413662423265308 — deflated!

matched_d = cohens_d(matched_f, matched_m)
print(matched_d)  # 1.0346898918452228 — as you would expect


  1. I was telling friend of the blog Tailcalled the other week that we really need to start a Marco del Guidice Fan Club!
  2. Marco del Giudice, "Measuring Sex Differences and Similarities", §2.3.3, "Measurement Error and Other Artifacts"
  3. Yanna J. Weisberg, Colin G. DeYoung, and Jacob B. Hirsh, "Gender Differences in Personality across the Ten Aspects of the Big Five", Table 2
  4. Special thanks to Tailcalled for catching a bug in the initially published version of this code.
  5. Arthur Jensen, The g Factor, Chapter 13: "Although no evidence was found for sex differences in the mean level of g or in the variability of g, there is clear evidence of marked sex differences in group factors and in test specificity. Males, on average, excel on some factors; females on others. [...] But the best available evidence fails to show a sex difference in g."