content/drafts/guess-ill-die.md

   1 Title: Guess I'll Die
   2 Author: Zack M. Davis
   3 Date: 2023-12-30 11:00
   4 Category: commentary
   5 Tags: autogynephilia, bullet-biting, cathartic, Eliezer Yudkowsky, Scott Alexander, epistemic horror, my robot cult, personal, sex differences, two-type taxonomy, whale metaphors
   6 Status: draft
   7
   8 > In desperation he quoted André Gide's remark: "It has all been said before, but you must say it again, since nobody listens." Unfortunately, judging by the quotations given here, Gide's remark is still relevant even today.
   9 >
  10 > —Neven Sesardic, _Making Sense of Heritability_
  11
  12 In a previous post, ["Agreeing With Stalin in Ways that Exhibit Generally Rationalist Principles"](/2023/Dec/agreeing-with-stalin-in-ways-that-exhibit-generally-rationalist-principles/) (the culmination of [three](/2023/Jul/blanchards-dangerous-idea-and-the-plight-of-the-lucid-crossdreamer/) [previous](/2023/Jul/a-hill-of-validity-in-defense-of-meaning/) [posts](/2023/Dec/if-clarity-seems-like-death-to-them/) relating the Whole Dumb Story of my disillusionment with the so-called "rationalist" community), I wrote, "If Eliezer Yudkowsky can't _unambigously_ choose Truth over Feelings, _then Eliezer Yudkowsky is a fraud_."
  13
  14 But I would be remiss to condemn Yudkowsky without discussing potentially mitigating factors. (I don't want to say that whether someone is a fraud should depend on whether there are mitigating factors—rather, I should discuss potential reasons why being a fraud might be the least-bad choice, when faced with a sufficiently desperate situation.)
  15
  16 [TOC]
  17
  18 ### Short Timelines _vs._ Raising the Sanity Waterline [working § title]
  19
  20 So far, I've been writing from the perspective of caring (and expecting Yudkowsky to care) about human rationality as a cause in its own right—about wanting to make sense, and wanting to live in a Society that made sense, for its own sake, not as a convergently instrumental subgoal of saving the world.
  21
  22 That's pretty much always where I've been at. I _never_ wanted to save the world. I got sucked in to this robot cult because Yudkowsky's philsophy-of-science blogging was just that good. I did do a little bit of work for the Singularity Institute back in the day (a "we don't pay you, but you can sleep in the garage" internship in 2009, some data-entry-like work manually adding Previous/Next links to the Sequences, designing several PowerPoint presentations for Anna Salamon, writing some Python scripts to organize their donor database), but that was because it was my social tribe and I had connections. To the extent that I took at all seriously the whole save/destroy/take-over the world part (about how we needed to encode all of human morality into a recursively self-improving artificial intelligence to determine our entire future light cone until the end of time), I was scared rather than enthusiastic.
  23
  24 Okay, being scared was entirely appropriate, but what I mean is that I was scared, and concluded that shaping the Singularity was not my problem, as contrasted to being scared, then facing up to the responsibility anyway. After a 2013 sleep-deprivation-induced psychotic episode which [featured](http://zackmdavis.net/blog/2013/03/religious/) [futurist](http://zackmdavis.net/blog/2013/04/prodrome/)-[themed](http://zackmdavis.net/blog/2013/05/relativity/) [delusions](http://zackmdavis.net/blog/2013/05/relevance/), I wrote to Anna, Michael Vassar, and some MIRI employees who had been in my contacts for occasional contract work, that "my current plan [was] to just try to forget about _Less Wrong_/MIRI for a long while, maybe at least a year, not because it isn't technically the most important thing in the world, but because I'm not emotionally stable enough think about this stuff anymore". When I got a real programming job and established an income for myself, I [donated to CfAR rather than MIRI](http://zackmdavis.net/blog/2016/12/philanthropy-scorecard-through-2016/), because public rationality was something I could be unambiguously enthusiastic about, and doing anything about AI was not.
  25
  26 At the time, it seemed fine for the altruistically-focused fraction of my efforts to focus on rationality, and to leave the save/destroy/take-over the world stuff to other, more emotionally-stable people, in accordance with the principle of comparative advantage. Yudkowsky had written his Sequences as a dependency for explaining [the need for Friendly AI](https://www.lesswrong.com/posts/GNnHHmm8EzePmKzPk/value-is-fragile), ["gambl\[ing\] only upon the portion of the activism that would flow to \[his\] own cause"](https://www.lesswrong.com/posts/9jF4zbZqz6DydJ5En/the-end-of-sequences), but rationality was supposed to be the [common interest of many causes](https://www.lesswrong.com/posts/4PPE6D635iBcGPGRy/rationality-common-interest-of-many-causes). Even if I wasn't working or donating to MIRI specifically, I was still _helping_, a good citizen according to the morality of my tribe.
  27
  28 But fighting for public epistemology is a long battle; it makes more sense if you have time for it to pay off. Back in the late 'aughts and early 'tens, it looked like we had time. We had these abstract philosophical arguments for worrying about AI, but no one really talked about timelines. I believed the Singularity was going to happen in the 21st century, but it felt like something to expect in the second half of the 21st century.
  29
  30 Now it looks like we have—less time? Not just tautologically because time has passed (the 21st century is one-fifth over—closer to a quarter over), but because of new information from the visible results of the deep learning revolution.[^second-half] Yudkowsky seemed particularly [spooked by AlphaGo](https://www.lesswrong.com/posts/7MCqRnZzvszsxgtJi/christiano-cotra-and-yudkowsky-on-ai-progress?commentId=gQzA8a989ZyGvhWv2) [and AlphaZero](https://intelligence.org/2017/10/20/alphago/) in 2016–2017, not because superhuman board game players were themselves dangerous, but because of what it implied about the universe of algorithms.
  31
  32 In the Sequences, Yudkowsky had been [dismissive of people who aspired to build AI without understanding how intelligence works](https://www.lesswrong.com/posts/fKofLyepu446zRgPP/artificial-mysterious-intelligence)—for example, by being overly impressed by the [surface analogy](https://www.lesswrong.com/posts/6ByPxcGDhmx74gPSm/surface-analogies-and-deep-causes) between artificial neural networks and the brain. He conceded the possibility of brute-forcing AI (if natural selection had eventually gotten there with no deeper insight, so could we) but didn't consider it a default and especially not a desirable path. (["If you don't know how your AI works, that is not good. It is bad."](https://www.lesswrong.com/posts/fKofLyepu446zRgPP/artificial-mysterious-intelligence))
  33
  34 These days, it's increasingly looking like making really large neural nets ... [actually works](https://www.gwern.net/Scaling-hypothesis)?—which seems like bad news; if it's "easy" for non-scientific-genius engineering talent to shovel large amounts of compute into the birth of powerful minds that we don't understand and don't know how to control, then it would seem that the world is soon to pass outside of our understanding and control.
  35
  36 [^second-half]: In an unfinished slice-of-life short story I started writing _circa_ 2010, my protagonist (a supermarket employee resenting his job while thinking high-minded thoughts about rationality and the universe) speculates about "a threshold of economic efficiency beyond which nothing human could survive" being a tighter bound on future history than physical limits (like the heat death of the universe), and comments that "it imposes a sense of urgency to suddenly be faced with the fabric of your existence coming apart in ninety years rather than 10<sup>90</sup>."
  37
  38     But if ninety years is urgent, what about ... nine? Looking at what deep learning can do in 2023, the idea of Singularity 2032 doesn't seem self-evidently absurd in the way that Singularity 2019 seemed absurd in 2010 (correctly, as it turned out).
  39
  40 My AlphaGo moment was 5 January 2021, when OpenAI released [DALL-E](https://openai.com/blog/dall-e/) (by far the most significant news story of [that week in January 2021](https://en.wikipedia.org/wiki/January_6_United_States_Capitol_attack)). Previous AI milestones, like [GANs](https://en.wikipedia.org/wiki/Generative_adversarial_network) for a fixed image class, felt easier to dismiss as clever statistical tricks. If you have thousands of photographs of people's faces, I didn't feel surprised that some clever algorithm could "learn the distribution" and spit out another sample; I don't know the details, but it doesn't seem like scary "understanding." DALL-E's ability to combine concepts—responding to "an armchair in the shape of an avacado" as a novel text prompt, rather than already having thousands of examples of avacado-chairs and just spitting out another one of those—viscerally seemed more like "real" creativity to me, something qualitatively new and scary.[^qualitatively-new]
  41
  42 [^qualitatively-new]: By mid-2022, DALL-E 2 and Midjourney and Stable Diffusion were generating much better pictures, but that wasn't surprising. Seeing AI being able to do a thing at all is the model update; AI being able to do the thing much better 18 months later feels "priced in."
  43
  44 [As recently as 2020, I had been daydreaming about](/2020/Aug/memento-mori/#if-we-even-have-enough-time) working at an embryo selection company (if they needed programmers—but everyone needs programmers, these days), and having that be my altruistic[^eugenics-altruism] contribution to the Great Common Task. Existing companies working on embryo selection [boringly](https://archive.is/tXNbU) [market](https://archive.is/HwokV) their services as being about promoting health, but [polygenic scores should work as well for maximizing IQ as they do for minimizing cancer risk](https://www.gwern.net/Embryo-selection).[^polygenic-score] Making smarter people would be a transhumanist good in its own right, and [having smarter biological humans around at the time of our civilization's AI transition](https://www.lesswrong.com/posts/2KNN9WPcyto7QH9pi/this-failing-earth) would give us a better shot at having it go well.[^ai-transition-go-well]
  45
  46 [^eugenics-altruism]: If it seems odd to frame _eugenics_ as "altruistic", translate it as a term of art referring to the component of my actions dedicating to optimizing the world at large, as contrasted to "selfishly" optimizing my own experiences.
  47
  48 [^polygenic-score]: Better, actually: [the heritability of IQ is around 0.65](https://en.wikipedia.org/wiki/Heritability_of_IQ), as contrasted to [about 0.33 for cancer risk](https://pubmed.ncbi.nlm.nih.gov/26746459/).
  49
  50 [^ai-transition-go-well]: Natural selection eventually developed intelligent creatures, but evolution didn't know what it was doing and was not foresightfully steering the outcome in any particular direction. The more humans know what we're doing, the more our will determines the fate of the cosmos; the less we know what we're doing, the more our civilization is just another primordial soup for the next evolutionary transition.
  51
  52 But pushing on embryo selection only makes sense as an intervention for optimizing the future if AI timelines are sufficiently long, and the breathtaking pace (or too-fast-to-even-take-a-breath pace) of the deep learning revolution is so much faster than the pace of human generations, that it's looking unlikely that we'll get that much time. If our genetically uplifted children would need at least twenty years to grow up to be productive alignment researchers, but unaligned AI is [on track to end the world in twenty years](https://www.lesswrong.com/posts/AfH2oPHCApdKicM4m/two-year-update-on-my-personal-ai-timelines), we would need to start having those children _now_ in order for them to make any difference at all.
  53
  54 [It's ironic that "longtermism" got traction as the word for the cause area of benefitting the far future](https://applieddivinitystudies.com/longtermism-irony/), because the decision-relevant beliefs of most of the people who think about the far future, end up working out to extreme short-termism. Common-sense longtermism—a longtermism that assumed there's still going to be a recognizable world of humans in 2123—would care about eugenics, and would be willing to absorb political costs today in order to fight for a saner future. The story of humanity would not have gone better if Galileo had declined to publish for pre-emptive fear of the Inquisition.
  55
  56 But if you think the only hope for there _being_ a future flows through maintaining influence over what large tech companies are doing as they build transformative AI, declining to contradict the state religion makes more sense—if you don't have time to win a culture war, because you need to grab hold of the Singularity (or perform a [pivotal act](https://arbital.com/p/pivotal/) to prevent it) _now_. If the progressive machine marks you as a transphobic bigot, the machine's functionaries at OpenAI or Meta AI Research are less likely to listen to you when you explain why [their safety plan](https://openai.com/blog/introducing-superalignment) won't work, or why they should have a safety plan at all.
  57
  58 (I remarked to "Thomas" in mid-2022 that DeepMind [changing its Twitter avatar to a rainbow variant of their logo for Pride month](https://web.archive.org/web/20220607123748/https://twitter.com/DeepMind) was a bad sign.)
  59
  60 ### Perhaps, if the World Were at Stake
  61
  62 So isn't there a story here where I'm the villain, willfully damaging humanity's chances of survival by picking unimportant culture-war fights in the existential-risk-reduction social sphere, when _I know_ that the sphere needs to keep its nose clean in the eyes of the progressive egregore? _That's_ why Yudkowsky said the arguably-technically-misleading things he said about my Something to Protect: he had to, to keep our collective nose clean. The people paying attention to contemporary politics don't know what I know, and can't usefully be told. Isn't it better for humanity if my meager talents are allocated to making AI go well? Don't I have a responsibility to fall in line and take one for the team—if the world is at stake?
  63
  64 As usual, the Yudkowsky of 2009 has me covered. In his short story ["The Sword of Good"](https://www.yudkowsky.net/other/fiction/the-sword-of-good), our protagonist Hirou wonders why the powerful wizard Dolf lets other party members risk themselves fighting, when Dolf could have protected them:
  65
  66 > _Because Dolf was more important, and if he exposed himself to all the risk every time, he might eventually be injured_, Hirou's logical mind completed the thought. _Lower risk, but higher stakes. Cold but necessary—_
  67 >
  68 > _But would you_, said another part of his mind, _would you, Hirou, let your friends walk before you and fight, and occasionally die, if you_ knew _that you yourself were stronger and able to protect them? Would you be able to stop yourself from stepping in front?_
  69 >
  70 > _Perhaps_, replied the cold logic. _If the world were at stake._
  71 >
  72 > _Perhaps_, echoed the other part of himself, _but that is not what was actually happening._
  73
  74 That is, there's no story under which misleading people about trans issues is on Yudkowsky's critical path for shaping the intelligence explosion. I'd prefer him to have free speech, but if _he_ thinks he can't afford to be honest about things he [already got right in 2009](https://www.lesswrong.com/posts/QZs4vkC7cbyjL9XA9/changing-emotions), he could just not issue pronouncements on topics where he intends to _ignore counterarguments on political grounds_.
  75
  76 In [a March 2021 Twitter discussion about why not to trust organizations that refuse to explain their reasoning, Yudkowsky wrote](https://twitter.com/esyudkowsky/status/1374161729073020937):
  77
  78 > Having some things you say "no comment" to, is not at _all_ the same phenomenon as being an organization that issues Pronouncements. There are a _lot_ of good reasons to have "no comments" about things. Anybody who tells you otherwise has no life experience, or is lying.
  79
  80 Sure. But if that's your story, I think you need to _actually not comment_. ["[A]t least 20% of the ones with penises are actually women"](https://www.facebook.com/yudkowsky/posts/10154078468809228) is _not "no comment"._ ["[Y]ou're not standing in defense of truth if you insist on a word, brought explicitly into question, being used with some particular meaning"](https://twitter.com/ESYudkowsky/status/1067198993485058048) is _not "no comment"_. We [did get a clarification on that one](https://www.facebook.com/yudkowsky/posts/10158853851009228)—but then, within a matter of months, he turned around and came back with his "simplest and best proposal" about how the "important things [...] would be all the things [he's] read [...] from human beings who are people—describing reasons someone does not like to be tossed into a Male Bucket or Female Bucket, as it would be assigned by their birth certificate", _which is also not "no comment."_
  81
  82 It's a little uncomfortable that I seem to be arguing for a duty to self-censorship here. If he has selected "pro-trans" arguments he feels safe publishing, what's the harm in publishing them? How could I object to the addition of more Speech to the discourse?
  83
  84 But I don't think it's the mere addition of the arguments to the discourse that I'm objecting to. (If some garden-variety trans ally had made the same dumb arguments, I would make the same counterarguments, but I wouldn't feel betrayed.)
  85
  86 It's the false advertising—the pretense that Yudkowsky is still the unchallengable world master of rationality, if he's going to behave like a garden-variety trans ally and reserve the right to _ignore counterarguments on political grounds_ (!!) when his incentives point that way.
  87
  88 In _Planecrash_, when Keltham decides he needs to destroy Golarion's universe on negative-leaning utilitarian grounds, he takes care to only deal with Evil people from then on, and not form close ties with the Lawful Neutral nation of Osirion, in order to not betray anyone who would have had thereby a reasonable expectation that their friend wouldn't try to destroy their universe: ["the stranger from dath ilan never pretended to be anyone's friend after he stopped being their friend"](https://glowfic.com/replies/1882395#reply-1882395).
  89
  90 Similarly, I think Yudkowsky should stop pretending to be our rationality teacher after he stopped being our rationality teacher and decided to be a politician instead.
  91
  92 I think it's significant that you don't see me picking fights with—say, Paul Christiano, because Paul Christiano doesn't repeatedly take a shit on my Something to Protect, because Paul Christiano isn't trying to be a religious leader. If Paul Christiano has opinions about transgenderism, we don't know about them. If we knew about them and they were correct, I would upvote them, and if we knew about them and they were incorrect, I would criticize them, but in either case, Christiano would not try to cultivate the impression that anyone who disagrees with him is insane. That's not his bag.
  93
  94 ### Decision Theory of Political Censorship
  95
  96 Yudkowsky's political cowardice is arguably puzzling in light of his timeless decision theory's recommendations against giving in to extortion.
  97
  98 The "arguably" is important, because randos on the internet are notoriously bad at drawing out the consequences of the theory, to the extent that Yudkowsky has said that he ["wish[es] that [he'd] never spoken on the topic"](https://twitter.com/ESYudkowsky/status/1509944888376188929)—and though I think I'm smarter than the average rando, I don't expect anyone to take my word for it. So let me disclaim that this is _my_ explanation of how Yudkowsky's decision theory _could be interpreted_ to recommend that he behave the way I want him to, without any pretense that I'm any sort of neutral expert witness on decision theory.
  99
 100 The idea of timeless decision theory is that you should choose the action that has the best consequences given that your decision is mirrored at all the places your decision algorithm is embedded in the universe.
 101
 102 The reason this is any different from the "causal decision theory" of just choosing the action with the best consequences (locally, without any regard to this "multiple embeddings in the universe" nonsense) is because it's possible for other parts of the universe to depend on your choices. For example, in the "Parfit's Hitchhiker" scenario, someone might give you a ride out of the desert if they predict you'll pay them back later. After you've already received the ride, you might think that you can get away with stiffing them—but if they'd predicted you would do that, they wouldn't have given you the ride in the first place. Your decision is mirrored inside the world-model every other agent with a sufficiently good knowledge of you.
 103
 104 In particular, if you're the kind of agent that gives in to extortion—if you respond to threats of the form "Do what I want, or I'll hurt you" by doing what the threatener wants—that gives other agents an incentive to spend resources trying to extort you. On the other hand, if any would-be extortionist knows you'll never give in, they have no reason to bother trying. This is where the [standard](https://en.wikipedia.org/wiki/Government_negotiation_with_terrorists) ["Don't negotiate with terrorists"](/2018/Jan/dont-negotiate-with-terrorist-memeplexes/) advice comes from.
 105
 106 So, naïvely, doesn't Yudkowsky's "personally prudent to post your agreement with Stalin"[^gambit] gambit constitute giving in to an extortion threat of the form, "support the progressive position, or we'll hurt you", which Yudkowsky's own decision theory says not to do?
 107
 108 [^gambit]: In _ways that exhibit generally rationalist principles_, natch.
 109
 110 I can think of two reasons why the naïve objection might fail. (And who can say but that a neutral expert witness on decision theory wouldn't think of more?)
 111
 112 First, the true decision theory is subtler than "defy anything that you can commonsensically pattern-match as looking like 'extortion'"; the case for resisting extortion specifically rests on there existing a subjunctive dependence between your decision and the extortionist's decision: they threaten _because_ you'll give in, or don't bother _because_ you won't.
 113
 114 Okay, but then how do I compute this "subjunctive dependence" thing? Presumably it has something to do with the extortionist's decisionmaking process incuding a model of the target. How good does that model have to be for it to "count"?
 115
 116 I don't know—and if I don't know, I can't say that the relevant subjunctive dependence obviously pertains in the real-life science intellectual _vs._ social justice mob match-up. If the mob has been trained from past experience to predict that their targets will give in, should you defy them now in order to somehow make your current predicament "less real"? Depending on the correct theory of logical counterfactuals, the correct stance might be "We don't negotiate with terrorists, but [we do appease bears](/2019/Dec/political-science-epigrams/) and avoid avalanches" (because neither the bear's nor the avalanche's behavior is calculated based on our response), and the forces of political orthodoxy might be relevantly bear- or avalanche-like.
 117
 118 On the other hand, the relevant subjunctive dependence doesn't obviously _not_ pertain, either! Yudkowsky does seem to endorse commonsense pattern-matching to "extortion" in contexts [like nuclear diplomacy](https://twitter.com/ESYudkowsky/status/1580278376673120256). Or I remember back in 2009, Tyler Emerson was caught embezzling funds from the Singularity Institute, and SingInst made it a point of pride to prosecute on decision-theoretic grounds, when a lot of other nonprofits would have quietly and causal-decision-theoretically covered it up to spare themselves the embarrassment. Parsing social justice as an agentic "threat" rather than a non-agentic obstacle like an avalanche, does seem to line up with the fact that people punish heretics (who dissent from an ideological group) more than infidels (who were never part of the group to begin with), because heretics are more extortable—more vulnerable to social punishment from the original group.
 119
 120 Which brings me to the second reason the naïve anti-extortion argument might fail: [what counts as "extortion" depends on the relevant "property rights", what the "default" action is](https://www.lesswrong.com/posts/Qjaaux3XnLBwomuNK/countess-and-baron-attempt-to-define-blackmail-fail). If having free speech is the default, being excluded from the dominant coalition for defying the orthodoxy could be construed as extortion. But if being excluded from the coalition is the default, maybe toeing the line of orthodoxy is the price you need to pay in order to be included.
 121
 122 Yudkowsky has [a proposal for how bargaining should work between agents with different notions of "fairness"](https://www.lesswrong.com/posts/z2YwmzuT7nWx62Kfh/cooperating-with-agents-with-different-ideas-of-fairness). Suppose Greg and Heather are splitting a pie, and if they can't initially agree on how to split it, they have to fight over it until they do agree, destroying some of the pie in the process. Greg thinks the fair outcome is that they each get half the pie. Heather claims that she contributed more ingredients to the baking process and that it's therefore fair that she gets 75% of the pie, pledging to fight if offered anything less.
 123
 124 If Greg were a causal decision theorist, he might agree to the 75/25 split, reasoning that 25% of the pie is better than fighting until the pie is destroyed. Yudkowsky argues that this is irrational: if Greg is willing to agree to a 75/25 split, then Heather has no incentive not to adopt such a self-favoring definition of "fairness". (And _vice versa_ if Heather's concept of fairness is the "correct" one.)
 125
 126 Instead, Yudkowsky argues, Greg should behave so as to only do worse than the fair outcome if Heather also does worse: for example, by accepting a 48/32 split in Heather's favor (after 100−(32+48) = 20% of the pie has been destroyed by the costs of fighting) or an 42/18 split (where 40% of the pie has been destroyed). This isn't Pareto-optimal (it would be possible for both Greg and Heather to get more pie by reaching an agreement with less fighting), but it's worth it to Greg to burn some of Heather's utility fighting in order to resist being exploited by her, and at least it's better than the equilibrium where the entire pie gets destroyed (which is Nash because neither party can unilaterally stop fighting).
 127
 128 It seemed to me that in the contest over the pie of Society's shared map, the rationalist Caliphate was letting itself get exploited by the progressive Egregore, doing worse than the fair outcome without dealing any damage to the Egregore in return. Why?
 129
 130 [The logic of dump stats](/2023/Dec/agreeing-with-stalin-in-ways-that-exhibit-generally-rationalist-principles/#dump-stats), presumably. Bargaining to get AI risk on the shared map—not even to get it taken seriously as we would count "taking it seriously", but just acknowledged at all—was hard enough. Trying to challenge the Egregore about an item that it actually cared about would trigger more fighting than we could afford.
 131
 132 In my illustrative story, if Greg and Heather destroy the pie fighting, then neither of them get any pie. But in more complicated scenarios (including the real world), there was no guarantee that non-Pareto Nash equilibria were equally bad for everyone.
 133
 134 I had a Twitter exchange with Yudkowsky in January 2020 that revealed some of his current-year thinking about Nash equilibria. I [had Tweeted](https://twitter.com/zackmdavis/status/1206718983115698176):
 135
 136 > 1940s war criminal defense: "I was only following orders!"
 137 > 2020s war criminal defense: "I was only participating in a bad Nash equilibrium that no single actor can defy unilaterally!"
 138
 139 (The language of the latter being [a reference to Yudkowsky's _Inadequate Equilibria_](https://equilibriabook.com/molochs-toolbox/).)
 140
 141 Yudkowsky [quote-Tweet dunked on me](https://twitter.com/ESYudkowsky/status/1216788984367419392):
 142
 143 > Well, YES. Paying taxes to the organization that runs ICE, or voting for whichever politician runs against Trump, or trading with a doctor benefiting from an occupational licensing regime; these acts would all be great evils if you weren't trapped.
 144
 145 I pointed out the voting case as one where he seemed to be disagreeing with his past self, linking to 2008's ["Stop Voting for Nincompoops"](https://www.lesswrong.com/posts/k5qPoHFgjyxtvYsm7/stop-voting-for-nincompoops). What changed his mind?
 146
 147 "Improved model of the social climate where revolutions are much less startable or controllable by good actors," he said. "Having spent more time chewing on Nash equilibria, and realizing that the trap is _real_ and can't be defied away even if it's very unpleasant."
 148
 149 I asked what was wrong with the disjunction from "Stop Voting for Nincompoops", where the earlier Yudkowsky had written that it's hard to see who should accept the argument to vote for the lesser of two evils, but refuse to accept the argument against voting because it won't make a difference. Unilaterally voting for Clinton wouldn't stop Trump.
 150
 151 "Vote when you're part of a decision-theoretic logical cohort large enough to change things, or when you're worried about your reputation and want to be honest about whether you voted," he replied.
 152
 153 "How do I compute whether I'm in a large enough decision-theoretic cohort?" I asked. Did we know that, or was that still on the open problems list?
 154
 155 Yudkowsky said that he [traded his vote for a Clinton swing state vote](https://en.wikipedia.org/wiki/Vote_pairing_in_the_2016_United_States_presidential_election), partially hoping that that would scale, "but maybe to a larger degree because [he] anticipated being asked in the future if [he'd] acted against Trump".
 156
 157 The reputational argument seems in line with Yudkowsky's [pathological obsession with not-technically-lying](https://www.lesswrong.com/posts/MN4NRkMw7ggt9587K/firming-up-not-lying-around-its-edge-cases-is-less-broadly). People asking if you acted against Trump are looking for a signal of coalitional loyalty. By telling them he traded his vote, Yudkowsky can pass their test without lying.
 158
 159 I guess that explains everything. He doesn't think he's part of a decision-theoretic logical cohort large enough to change things. He's not anticipating being asked in the future if he's acted against gender ideology. He doesn't doesn't care about trashing his reputation with me, because I don't matter.
 160
 161 Curtis Yarvin [likes to compare](/2020/Aug/yarvin-on-less-wrong/) Yudkowsky to Sabbatai Zevi, the 17th-century Jewish religious leader purported to be the Messiah, who later [converted to Islam under coercion from the Ottomans](https://en.wikipedia.org/wiki/Sabbatai_Zevi#Conversion_to_Islam). "I know, without a shadow of a doubt, that in the same position, Eliezer Yudkowsky would also convert to Islam," said Yarvin.
 162
 163 I don't think this is as much of a burn as Yarvin does. Zevi was facing some very harsh coercion: a choice to convert to Islam, "prove" his divinity via deadly trial by ordeal, or just be impaled outright. Extortion-resistant decision theories aside, it's hard not to be sympathetic to someone facing this trilemma who chose to convert.
 164
 165 So to me, the more damning question is this—
 166
 167 If in the same position as Yudkowsky, would Sabbatai Zevi also declare that 30% of the ones with penises are actually women?
 168
 169 ### The Dolphin War (June 2021)
 170
 171 In June 2021, MIRI Executive Director Nate Soares [wrote a Twitter thread aruging that](https://twitter.com/So8res/status/1401670792409014273) "[t]he definitional gynmastics required to believe that dolphins aren't fish are staggering", which [Yudkowsky retweeted](https://archive.is/Ecsca).[^not-endorsements]
 172
 173 [^not-endorsements]: In general, retweets are not necessarily endorsements—sometimes people just want to draw attention to some content without further comment or implied approval—but I was inclined to read this instance as implying approval, partially because this doesn't seem like the kind of thing someone would retweet for attention-without-approval, and partially because of the working relationship between Soares and Yudkowsky.
 174
 175 Soares's points seemed cribbed from part I of Scott Alexander's ["... Not Man for the Categories"](https://slatestarcodex.com/2014/11/21/the-categories-were-made-for-man-not-man-for-the-categories/). Soares's [reference to the Book of Jonah](https://twitter.com/So8res/status/1401670796997660675) made it seem particularly unlikely that he had invented the argument independently from Alexander. [One of the replies (which Soares Liked) pointed out the similar _Slate Star Codex_ article](https://twitter.com/max_sixty/status/1401688892940509185), [as did](https://twitter.com/NisanVile/status/1401684128450367489) [a couple of](https://twitter.com/roblogic_/status/1401699930293432321) quote-Tweet discussions.
 176
 177 The elephant in my brain took this as another occasion to _flip out_. I didn't immediately see anything for me to overtly object to in the thread itself—[I readily conceded that](https://twitter.com/zackmdavis/status/1402073131276066821) there was nothing necessarily wrong with wanting to use the symbol "fish" to refer to the cluster of similarities induced by convergent evolution to the acquatic habitat rather than the cluster of similarities induced by phylogenetic relatedness—but in the context of our subculture's history, I read this as Soares and Yudkowsky implicitly lending more legitimacy to "... Not Man for the Categories", which post I had just dedicated more than three years of my life to rebutting in [increasing](/2018/Feb/the-categories-were-made-for-man-to-make-predictions/) [technical](https://www.lesswrong.com/posts/esRZaPXSHgWzyB2NL/where-to-draw-the-boundaries) [detail](https://www.lesswrong.com/posts/onwgTH6n8wxRSo2BJ/unnatural-categories-are-optimized-for-deception), specifically using dolphins as my central example—which Soares didn't necessarily have any reason to have known about, but Yudkowsky definitely did. Was I paranoid to read this as a potential [dogwhistle](https://en.wikipedia.org/wiki/Dog_whistle_(politics))? It just seemed implausible that Soares would be Tweeting that dolphins are fish in the counterfactual in which "... Not Man for the Categories" had never been published.
 178
 179 After a little more thought, I decided that Soares's thread _was_ overtly objectionable, and [quickly wrote up a reply on _Less Wrong_](https://www.lesswrong.com/posts/aJnaMv8pFQAfi9jBm/reply-to-nate-soares-on-dolphins): Soares wasn't merely advocating for a "swimmy animals" sense of the word _fish_ to become more accepted usage, but specifically deriding phylogenetic definitions as unmotivated for everyday use ("definitional gynmastics [_sic_]"), and _that_ was wrong. It's true that most language users don't directly care about evolutionary relatedness, but [words aren't identical with their definitions](https://www.lesswrong.com/posts/i2dfY65JciebF3CAo/empty-labels). Genetics is at the root of the causal graph underlying all other features of an organism; creatures that are more closely evolutionarily related are more similar in general. Classifying things by evolutionary lineage isn't an arbitrary æsthetic whim by people who care about geneology for no reason. We need the natural category of "mammals (including marine mammals)" to make sense of how dolphins are warm-blooded, breathe air, and nurse their live-born young, and the natural category of "finned cold-blooded vertebrate gill-breathing swimmy animals (which excludes marine mammals)" is also something that it's reasonable to have a word for.
 180
 181 (Somehow, it felt appropriate to use a quote from Arthur Jensen's ["How Much Can We Boost IQ and Scholastic Achievement?"](https://en.wikipedia.org/wiki/How_Much_Can_We_Boost_IQ_and_Scholastic_Achievement%3F) as an epigraph.)
 182
 183 On [Twitter](https://twitter.com/So8res/status/1402888263593959433) Soares conceded my main points, but said that the tone, and the [epistemic-status followup thread](https://twitter.com/So8res/status/1401761124429701121), were intended to indicate that the original thread was "largely in jest"—"shitposting"—but that he was "open to arguments that [he was] making a mistake here."
 184
 185 I didn't take that too well, and threw an eleven-Tweet tantrum. Soares wrote a longer comment on _Less Wrong_ the next morning, and I [pointed out that](https://www.greaterwrong.com/posts/aJnaMv8pFQAfi9jBm/reply-to-nate-soares-on-dolphins/comment/BBtSuWcdaFyvgddE4) Soares's followup thread had lamented ["the fact that nobody's read A Human's Guide to Words or w/e"](https://twitter.com/So8res/status/1401761130041659395), but—with respect—he wasn't behaving like _he_ had read it. Specifically, [#30](https://www.greaterwrong.com/posts/d5NyJ2Lf6N22AD9PB/where-to-draw-the-boundary) on the list of ["37 Ways Words Can Be Wrong"](https://www.greaterwrong.com/posts/FaJaCgqBKphrDzDSj/37-ways-that-words-can-be-wrong) had characterized the position that dolphins are fish as "playing nitwit games". This didn't seem controversial in 2008.
 186
 187 And yet it would seem that sometime between 2008 and the current year, the "rationalist" party line (as observed in the public statements of SingInst/MIRI leadership) on whether dolphins are fish shifted from (my paraphrases) "No; _despite_ the surface similarities, that categorization doesn't carve reality at the joints; stop playing nitwit games" to "Yes, _because_ of the surface similarities; those who contend otherwise are the ones playing nitwit games." A complete 180° reversal, on this specific example! Why? What changed?
 188
 189 It would make sense if people's opinions changed due to new arguments. (Indeed, Yudkowsky's original "stop playing nitwit games" dismissal had been sloppy, and I had had occasion in ["Where to Draw the Boundaries?"](https://www.lesswrong.com/posts/esRZaPXSHgWzyB2NL/where-to-draw-the-boundaries) to explain the specific senses in which dolphins both do and do not cluster with fish.)
 190
 191 But when people change their mind due to new arguments, you'd expect them to acknowledge the change, and explain how the new arguments show that why they thought before was actually wrong. Soares hadn't even acknowledged the change!
 192
 193 Soares wrote [a comment explaining](https://www.greaterwrong.com/posts/aJnaMv8pFQAfi9jBm/reply-to-nate-soares-on-dolphins/comment/HwSkiN62QeuEtGWpN) why he didn't think it was such a large reversal. I [started drafting a counterreply](/ancillary/dolphin-war/), but decided that it would need to become a full post on the timescale of days or weeks, partially because I needed to think through how to reply to Soares about paraphyletic groups, and partially because the way the associated Twitter discussion had gone (including some tussling with Yudkowsky) made me want to modulate my tone. (I noted that I had probably lost some in-group credibility in the Twitter fight, but the information gained seemed more valuable. Losing in-group credibility didn't hurt so much when I didn't respect the group anymore.)
 194
 195 Subjectively, I was feeling time pressure on my reply, and in the meantime, I ended up adding [a huffy comment](https://www.greaterwrong.com/posts/aJnaMv8pFQAfi9jBm/reply-to-nate-soares-on-dolphins/comment/rMHcWfqkH89LWt4y9) to the _Less Wrong_ thread taking issue with Soares's still-flippant tone. That was a terrible performance on my part. It got downvoted to oblivion, and I deserved it.
 196
 197 In general, my social behavior during this entire episode was histrionic, and I probably could have gotten an equal-or-better outcome if I had kept my cool. The reason I didn't feel like keeping my cool was because after years of fighting this Category War, MIRI doubling down on "dolphins are fish" felt like a gratuitous insult. I was used to "rationalists" ever-so-humbly claiming to be open to arguments that they were making a mistake, but I couldn't take such assurances seriously if they were going to keep sending PageRank-like credibility to "... Not Man for the Categories".
 198
 199 Soares [wrote that](https://www.greaterwrong.com/posts/aJnaMv8pFQAfi9jBm/reply-to-nate-soares-on-dolphins/comment/8nmjnrm4cwgCCyYrG) I was persistently mis-modeling his intentions, that I seemed to be making a plea for deference that he rejected.
 200
 201 I don't think I wanted deference, though. I write these thousands of words in the hopes that people will read my arguments and think it through for themselves; I would never expect anyone to take my word for the conclusion. What I was hoping for was a fair hearing, and by that point, I had lost hope of getting one.
 202
 203 As for my skill at modeling intent, I think it's less relevant than Soares seemed to think. I believe his self-report that he wasn't trying to make a coded statement about gender; my initial impression otherwise _was_ miscalibrated. (As he pointed out, his "dolphins are fish" position could be given an "anti-trans" interpretation, too, in the form of "you intellectuals get your hands off my intuitive concepts". The association between "dolphins are fish" and "trans women are women" ran through their conjunction in Alexander's "... Not Man for the Categories", rather than being intrinsic to the beliefs themselves.)
 204
 205 The thing is, I was _right_ to notice the similarity between Soares's argument and "... Not Man for the Categories." Soares's [own account](https://www.greaterwrong.com/posts/aJnaMv8pFQAfi9jBm/reply-to-nate-soares-on-dolphins/comment/HwSkiN62QeuEtGWpN) agreed that there was a causal influence. Okay, so _Nate_ wasn't trying to play gender politics; Scott just alerted him to the idea that people didn't used to be interested in drawing their categories around phylogenetics, and Nate ran with that thought.
 206
 207 So where did _Scott_ get it from?
 208
 209 I think he pulled it out of his ass because it was politically convenient. I think if you asked Scott Alexander whether dolphins are fish in 2012, he would have said, "No, they're mammals," like any other educated adult.
 210
 211 In a world where the clock of "political time" had run a little bit slower, such that the fight for gay marriage had taken longer [such that the progressive _zeitgeist_ hadn't pivoted to trans as the new cause _du jour_](/2019/Aug/the-social-construction-of-reality-and-the-sheer-goddamned-pointlessness-of-reason/), I don't think Alexander would have had the occasion to write "... Not Man for the Categories." And in that world, I don't think "Dolphins are fish, fight me" or "Acknowledge that all of our categories are weird and a little arbitrary" would have become memes in our subculture.
 212
 213 This case is like [radiocontrast dye](https://en.wikipedia.org/wiki/Radiocontrast_agent) for [dark side epistemology](https://www.lesswrong.com/posts/XTWkjCJScy2GFAgDt/dark-side-epistemology). Because Scott Alexander won [the talent lottery](https://slatestarcodex.com/2015/01/31/the-parable-of-the-talents/) and writes faster than everyone else, he has the power to _sneeze his mistakes_ onto everyone who trusts Scott to have done his homework, even when he obviously hasn't.
 214
 215 [No one can think fast enough to think all their own thoughts](https://www.lesswrong.com/posts/2MD3NMLBPCqPfnfre/cached-thoughts), but you would hope for an intellectual community that can do error-correction, such that collective belief trends toward truth as [the signal of good arguments rises above the noise](https://slatestarcodex.com/2017/03/24/guided-by-the-beauty-of-our-weapons/), rather than being copied from celebrity leaders (including the mistakes).
 216
 217 It's true that there's a cluster of similarities induced by adaptations to the acquatic environment. It's reasonable to want to talk about that subspace. But it doesn't follow that phylogenetics is irrelevant. Genetics being at the root of the causal graph induces the kind of conditional independence relationships that make "categories" a useful AI trick.
 218
 219 But in a world where more people are reading "... Not Man for the Categories" than ["Mutual Information, and Density in Thingspace"](https://www.lesswrong.com/posts/yLcuygFfMfrfK8KjF/mutual-information-and-density-in-thingspace), and even the people who have read "Density in Thingspace" (once, ten years ago) are having most of their conversations with people who only read "... Not Man for the Categories"—what happens is that you end up with a so-called "rationalist" culture that completely forgot the hidden-Bayesian-structure-of-cognition/carve-reality-at-the-joints skill. People only remember the subset of "A Human's Guide to Words" that's useful for believing whatever you want (by cherry-picking the features you need to include in category Y to make your favorite "X is a Y" sentence look "true", which is easy for intricate high-dimensional things like biological creatures that have a lot of similarities to cherry-pick from), rather than the part about the conditional independence structure in the environment.
 220
 221 After I cooled down, I did eventually write up the explanation for why paraphyletic categories are fine, in ["Blood Is Thicker Than Water"](https://www.lesswrong.com/posts/vhp2sW6iBhNJwqcwP/blood-is-thicker-than-water). But I'm not sure that anyone cared.
 222
 223 ### Pretender to the Caliphate
 224
 225 I got a chance to talk to Yudkowsky in person at the 2021 Event Horizon[^event-horizon] Fourth of July party. In accordance with the privacy norms I'm adhering to while telling this Whole Dumb Story, I don't think I should elaborate on what was said. (It felt like a private conversation, even if most of it was outdoors at a party. No one joined in, and if anyone was listening, I didn't notice them.)
 226
 227 [^event-horizon]: Event Horizon was the name of a group house in Berkeley.
 228
 229 I will say that it finalized my sense that the vision of rationalism he had preached in the Sequences was dead as a cultural force. I was somewhat depressed for months afterwards.
 230
 231 It wouldn't be so bad if Yudkowsky weren't trying to sell himself as a _de facto_ religious leader,[^religious-leader] profiting from the conflation of _rationalist_ in the sense of "one who aspires to systematically correct reasoning" and _rationalist_ as member of his fan-club/personality-cult.
 232
 233 [^religious-leader]: "Religious leader" continues to seem like an apt sociological description, even if [no supernatural claims are being made](https://www.lesswrong.com/posts/u6JzcFtPGiznFgDxP/excluding-the-supernatural).
 234
 235 But he does seem to actively encourage this conflation. Contrast the ["Litany Against Gurus"](https://www.lesswrong.com/posts/t6Fe2PsEwb3HhcBEr/the-litany-against-gurus) from the Sequences, to the way he sneers at "post-rationalists"—or even "Earthlings" in general (in contrast to his fictional world of dath ilan). The framing is optimized to delegitimize dissent. [Motte](https://slatestarcodex.com/2014/11/03/all-in-all-another-brick-in-the-motte/): someone who's critical of central "rationalists" like Yudkowsky or Alexander; bailey: someone who's moved beyond reason itself.
 236
 237 One example that made me furious came in September 2021. Yudkowsky, replying to Scott Alexander on Twitter, [wrote](https://twitter.com/ESYudkowsky/status/1434906470248636419):
 238
 239 > Anyways, Scott, this is just the usual division of labor in our caliphate: we're both always right, but you cater to the crowd that wants to hear it from somebody too modest to admit that, and I cater to the crowd that wants somebody out of that closet.
 240
 241 I understand, of course, that it was meant as humorous exaggeration. But I think it still has the effect of discouraging people from criticizing Yudkowsky or Alexander because they're the leaders of the Caliphate. I had just spent more than three and a half years of my life[^years-of-my-life] [explaining in](/2018/Feb/the-categories-were-made-for-man-to-make-predictions/) [exhaustive](https://www.lesswrong.com/posts/esRZaPXSHgWzyB2NL/where-to-draw-the-boundaries), [exhaustive](https://www.lesswrong.com/posts/onwgTH6n8wxRSo2BJ/unnatural-categories-are-optimized-for-deception) [detail](https://www.lesswrong.com/posts/vhp2sW6iBhNJwqcwP/blood-is-thicker-than-water), with math, how Alexander was wrong about something, no one serious actually disagreed, and Yudkowsky was still using his social power to boost Scott's right-about-everything (!!) reputation. That seemed egregiously unfair, in a way that wasn't dulled by "it was just a joke."
 242
 243 [^years-of-my-life]: I started outlining ["The Categories Where Made for Man to Make Predictions"](/2018/Feb/the-categories-were-made-for-man-to-make-predictions/) in January 2018. I would finally finish ["Blood Is Thicker Than Water"](https://www.lesswrong.com/posts/vhp2sW6iBhNJwqcwP/blood-is-thicker-than-water), following up on the "dolphins are fish" claim later that month of September 2021.
 244
 245 Or [as Yudkowsky had once put it](https://www.facebook.com/yudkowsky/posts/10154981483669228):
 246
 247 > I know that it's a bad sign to worry about which jokes other people find funny. But you can laugh at jokes about Jews arguing with each other, and laugh at jokes about Jews secretly being in charge of the world, and not laugh at jokes about Jews cheating their customers. Jokes do reveal conceptual links and some conceptual links are more problematic than others.
 248
 249 I could understand wanting to avoid politically contentious topics because existential risk reduction is astronomically more important, but that rationale couldn't justify this kind of cartel behavior.[^cartel-rationale]
 250
 251 [^cartel-rationale]: Unless the idea was to reduce existential risk by drawing more people into our cult, suggesting an instrumental strategy of puffing up Scott Alexander's reputation, since he was the primary intake funnel now that _Harry Potter and the Methods of Rationality_ was completed?
 252
 253 An analogy: racist jokes are also just jokes. Irene says, "What's the difference between a black dad and a boomerang? A boomerang comes back." Jonas says, "That's super racist! Tons of African-American fathers are devoted parents!!" Irene says, "Chill out, it was just a joke." In a way, Irene is right. It was just a joke; no sane person could think that Irene was literally claiming that all black men are deadbeat dads. But the joke only makes sense in the first place in context of a culture where the black-father-abandonment stereotype is operative. If you thought the stereotype was false, or if you were worried about it being a self-fulfilling prophecy, you would find it tempting to be a humorless scold and get angry at the joke-teller.[^offensive-jokes-reflect-conceptual-links]
 254
 255 [^offensive-jokes-reflect-conceptual-links]: I once wrote [a post whimsically suggesting that trans women should owe cis women royalties](/2019/Dec/comp/) for copying the female form (as "intellectual property"). In response to a reader who got offended, I [ended up adding](/source?p=Ultimately_Untrue_Thought.git;a=commitdiff;h=03468d274f5) an "epistemic status" line to clarify that it was not a serious proposal.
 256
 257     But if knowing it was a joke partially mollifies the offended reader who thought I might have been serious, I don't think they should be completely mollified, because the joke (while a joke) reflects something about my thinking when I'm being serious: I don't think sex-based collective rights are inherently a crazy idea; I think something of value has been lost when women who want female-only spaces can't have them, and the joke reflects the conceptual link between the idea that something of value has been lost, and the idea that people who have lost something of value are entitled to compensation.
 258
 259 Similarly, the "Caliphate" humor only makes sense in the first place in the context of a celebrity culture where deferring to Yudkowsky and Alexander is expected behavior, in a way that deferring to [Julia Galef](https://en.wikipedia.org/wiki/Julia_Galef) or [John S. Wentworth](https://www.lesswrong.com/users/johnswentworth) is not expected behavior.
 260
 261 ### Replies to David Xu on Category Cruxes [working § title]
 262
 263 I don't think the motte-and-bailey concern is hypothetical. When I [indignantly protested](https://twitter.com/zackmdavis/status/1435059595228053505) the "we're both always right" remark, one David Xu [commented](https://twitter.com/davidxu90/status/1435106339550740482): "speaking as someone who's read and enjoyed your LW content, I do hope this isn't a sign that you're going full post-rat"—as if my criticism of Yudkowsky's self-serving bluster itself marked me as siding with the "post-rats"!
 264
 265 Concerning my philosophy-of-language grievance, [Xu wrote](https://twitter.com/davidxu90/status/1436007025545125896) (with Yudkowsky ["endors[ing] everything [Xu] just said"](https://twitter.com/ESYudkowsky/status/1436025983522381827)):
 266
 267 > I'm curious what might count for you as a crux about this; candidate cruxes I could imagine include: whether some categories facilitate inferences that _do_, on the whole, cause more harm than benefit, and if so, whether it is "rational" to rule that such inferences should be avoided when possible, and if so, whether the best way to disallow a large set of potential inferences is [to] proscribe the use of the categories that facilitate them—and if _not_, whether proscribing the use of a category in _public communication_ constitutes "proscribing" it more generally, in a way that interferes with one's ability to perform "rational" thinking in the privacy of one's own mind.
 268 >
 269 > That's four possible (serial) cruxes I listed, one corresponding to each "whether".
 270
 271 I reply: on the first and second cruxes, concerning whether some categories facilitate inferences that cause more harm than benefit on the whole and whether they should be avoided when possible, I ask: harm _to whom?_ Not all agents have the same utility function! If some people are harmed by other people making certain probabilistic inferences, then it would seem that there's a conflict between the people harmed (who prefer that such inferences be avoided when possible), and people who want to make and share probabilistic inferences about reality (who think that that which can be destroyed by the truth, should be).
 272
 273 On the third crux, whether the best way to disallow a large set of potential inferences is to proscribe the use of the categories that facilitate them: well, it's hard to be sure whether it's the _best_ way: no doubt a more powerful intelligence could search over a larger space of possible strategies than me. But yeah, if your goal is to prevent people from making inferences, then preventing them from corresponding language seems like a pretty effective way to do it!
 274
 275 On the fourth crux, whether proscribing the use of a category in public communication constitutes "proscribing" in a way that interferes with one's ability to think in the privacy of one's own mind: I think this is mostly true for humans. We're social animals. To the extent that we can do higher-grade cognition at all, we do it using our language faculties that are designed for communicating with others. How are you supposed to think about things that you don't have words for?
 276
 277 Xu continues:
 278
 279 > I could have included a fifth and final crux about whether, even _if_ The Thing In Question interfered with rational thinking, that might be worth it; but this I suspect you would not concede, and (being a rationalist) it's not something I'm willing to concede myself, so it's not a crux in a meaningful sense between us (or any two self-proclaimed "rationalists").
 280 >
 281 > My sense is that you have (thus far, in the parts of the public discussion I've had the opportunity to witness) been behaving as though the _one and only crux in play_—that is, the True Source of Disagreement—has been the fifth crux, the thing I refused to include with the others of its kind. Your accusations against the caliphate _only make sense_ if you believe the dividing line between your behavior and theirs is caused by a disagreement as to whether "rational" thinking is "worth it"; as opposed to, say, what kind of prescriptions "rational" thinking entails, and which (if any) of those prescriptions are violated by using a notion of gender (in public, where you do not know in advance who will receive your communications) that does not cause massive psychological damage to some subset of people.
 282 >
 283 > Perhaps it is your argument that all four of the initial cruxes I listed are false; but even if you believe that, it should be within your set of ponderable hypotheses that people might disagree with you about that, and that they might perceive the disagreement to be _about_ that, rather than (say) about whether subscribing to the Blue Tribe view of gender makes them a Bad Rationalist, but That's Okay because it's Politically Convenient.
 284 >
 285 > This is the sense in which I suspect you are coming across as failing to properly Other-model.
 286
 287 After everything I've been through over the past seven years, I'm inclined to think it's not a "disagreement" at all.
 288
 289 It's a conflict. I want to facilitate people making inferences (full stop). The Caliphate doesn't want to facilitate people publicly making inferences that, on the whole, cause more harm than benefit—for example, by putatively causing massive psychological damage to some subset of people. This isn't a disagreement about rationality, because telling the truth isn't rational _if you don't want people to know things_.
 290
 291 I anticipate this being construed as me doubling down on failing to properly Other-model, because I'm associating my side of the conflict with "telling the truth", which is a positive-valence description. But ... what am I getting wrong, substantively, as a matter of fact rather than mere tone? It seems to me that declining to "facilitate inferences that _do_, on the whole, cause more harm than benefit" (Xu's words, verbatim) is a form of not wanting people to know things.
 292
 293 It's not like my side of the conflict isn't biting any bullets, either. I'm saying that I'm fine with my inferences _causing more harm than benefit_. Isn't that monstrous of me? Why would someone do that?
 294
 295 One of the better explanations of this that I know of was (again, as usual) authored by Yudkowsky in 2007, in a post titled ["Doublethink (Choosing to be Biased)"](https://www.lesswrong.com/posts/Hs3ymqypvhgFMkgLb/doublethink-choosing-to-be-biased). It's again worth quoting at length—
 296
 297 > What if self-deception helps us be happy? What if just running out and overcoming bias will make us—gasp!—_unhappy?_ Surely, _true_ wisdom would be _second-order_ rationality, choosing when to be rational. That way you can decide which cognitive biases should govern you, to maximize your happiness.
 298 >
 299 > [...]
 300 >
 301 > For second-order rationality to be genuinely _rational_, you would first need a good model of reality, to extrapolate the consequences of rationality and irrationality. If you then chose to be first-order irrational, you would need to forget this accurate view. And then forget the act of forgetting. I don't mean to commit the logical fallacy of generalizing from fictional evidence, but I think Orwell did a good job of extrapolating where this path leads.
 302 >
 303 > You can't know the consequences of being biased, until you have already debiased yourself. And then it is too late for self-deception.
 304 >
 305 > The other alternative is to choose blindly to remain biased, without any clear idea of the consequences. This is not second-order rationality. It is willful stupidity.
 306 >
 307 > [...]
 308 >
 309 > One of chief pieces of advice I give to aspiring rationalists is "Don't try to be clever." And, "Listen to those quiet, nagging doubts." If you don't know, you don't know _what_ you don't know, you don't know how _much_ you don't know, and you don't know how much you _needed_ to know.
 310 >
 311 > There is no second-order rationality. There is only a blind leap into what may or may not be a flaming lava pit. Once you _know_, it will be too late for blindness.
 312
 313 The post opens with an epigraph from George Orwell's _1984_, in which O'Brien (a loyal member of the ruling Party in the totalitarian state depicted in the novel) burns a photograph of Jones, Aaronson, and Rutherford—former Party leaders whose existence has been censored from the historical record. Immediately after burning the photograph, O'Brien denies that it ever existed.
 314
 315 Orwell was too optimistic. In some ways, people's actual behavior is worse than what he depicted. The Party of Orwell's _1984_ covers its tracks: O'Brien takes care to burn the photograph before denying memory of it, because it would be too absurd for him to act like the photo had never existed while it was still right there in front of him.
 316
 317 In contrast, Yudkowsky's Caliphate of the current year doesn't even bother covering its tracks. It doesn't need to: people just don't remember things. ["Changing Emotions"](https://www.lesswrong.com/posts/QZs4vkC7cbyjL9XA9/changing-emotions) is still up and not retracted, but that didn't stop the Yudkowsky of 2016 from pivoting to ["at least 20% of the ones with penises are actually women"](https://www.facebook.com/yudkowsky/posts/10154078468809228) when that became a politically favorable thing to say.
 318
 319 I claim that "Changing Emotions" and the 2016 Facebook post effectively contradict each other, even if I can't point to a sentence from each that are the same except one that includes the word _not_. The former explains why men who fantasize about being women are not only out of luck given forseeable technology, but also that their desires may not even be coherent (!), whereas the latter claims that men who wish they were women may, in fact, already be women in some unspecified psychological sense. One could try to argue that "Changing Emotions" is addressing cis men with a weird sex-change fantasy, whereas the "ones with penises are actually women" claim was about trans women, which are a different thing—or simply that Yudkowsky changed his mind.
 320
 321 But when people change their minds (as opposed to merely changing what they say in public for political reasons), you expect them to be able to acknowledge the change, and hopefully explain what new evidence or reasoning brought them around. If they can't even acknowledge the change, that's like O'Brien trying to claim that the photograph is of different men who just coincidentally happen to look like Jones, Aaronson, and Rutherford.
 322
 323 Likewise, ["Doublethink (Choosing to be Biased)"](https://www.lesswrong.com/posts/Hs3ymqypvhgFMkgLb/doublethink-choosing-to-be-biased) is still up and not retracted, but that didn't stop Yudkowsky from [endorsing everything Xu said](https://twitter.com/ESYudkowsky/status/1436025983522381827) about "whether some categories facilitate inferences that _do_, on the whole, cause more harm than benefit, and if so, whether it is 'rational' to rule that such inferences should be avoided when possible" being different cruxes than "whether 'rational' thinking is 'worth it'".
 324
 325 Here again, given the flexibility of natural language and the fact that the 2021 text does not assert the logical negation of any sentence in the 2007 text, you could totally come up with some clever casuistry for why the two texts are compatible. One could argue: "Doublethink" is narrowly about avoiding, as an individual, the specific form of self-deception in which an individual tries to avoid drawing their own attention to unpleasant facts; [that's a different issue](https://www.lesswrong.com/posts/yDfxTj9TKYsYiWH5o/the-virtue-of-narrowness) from whether some categories facilitate inferences that cause more harm than benefit, especially in public discourse.
 326
 327 But _realistically_—how dumb do you think we are? I would expect someone who's not desperately fixated on splitting whatever hairs are necessary to protect the Caliphate's reputation to notice the obvious generalization from "sane individuals shouldn't hide from facts to save themselves psychological pain, because you need the facts to compute plans that achieve outcomes" to "sane societies shouldn't hide from concepts to save their members psychological pain, because we need concepts to compute plans that acheive outcomes." If Xu and Yudkowsky claim not to see it even after I've called their bluff, how dumb should _I_ think _they_ are? Let me know in the comments.
 328
 329 ### Secrets of the "Vassarties" (October 2021)
 330
 331 In October 2021, Jessica Taylor [published a post about her experiences at MIRI](https://www.lesswrong.com/posts/MnFqyPLqbiKL8nSR7/my-experience-at-and-around-miri-and-cfar-inspired-by-zoe), making analogies between sketchy social pressures she had experienced in the core rationalist community (around short AI timelines, secrecy, deference to community leaders, _&c._) and those reported in [Zoe Cramer's recent account of her time at Leverage Research](https://medium.com/@zoecurzi/my-experience-with-leverage-research-17e96a8e540b).
 332
 333 A 950-comment mega-trainwreck erupted, sparked by a comment by Scott Alexander [claiming to add important context](https://www.lesswrong.com/posts/MnFqyPLqbiKL8nSR7/my-experience-at-and-around-miri-and-cfar-inspired-by-zoe?commentId=4j2GS4yWu6stGvZWs), essentially blaming Jessica's problems on her association with Michael Vassar, to the point of describing her psychotic episode as a "Vassar-related phenomenon" (!).
 334
 335 I explained [why I thought Scott was being unfair](https://www.lesswrong.com/posts/MnFqyPLqbiKL8nSR7/my-experience-at-and-around-miri-and-cfar-inspired-by-zoe?commentId=GzqsWxEp8uLcZinTy). Scott [contended](https://www.lesswrong.com/posts/MnFqyPLqbiKL8nSR7/my-experience-at-and-around-miri-and-cfar-inspired-by-zoe?commentId=XpEpzvHPLkCH7W7jS) that my joining the "Vassarites"[^vassarite-scare-quotes] had been harmful to me, and revealed a new-to-me detail about [the dramatic events of March 2019](/2023/Jul/a-hill-of-validity-in-defense-of-meaning/#overheating). He had emailed my posse at the time:
 336
 337 > accusing them of making your situation worse and asking them to maybe lay off you until you were maybe feeling slightly better, and obviously they just responded with their "it's correct to be freaking about learning your entire society is corrupt and gaslighting" shtick.
 338
 339 [^vassarite-scare-quotes]: Scare quotes because "Vassarite" seems likely to be Alexander's coinage; we didn't call ourselves that.
 340
 341 But I will _absolutely_ bite the bullet on it being correct to freak out about learning your entire Society is corrupt and gaslighting (as I explained to Scott in an asynchronous 22–27 October 2021 conversation on Discord).
 342
 343 Imagine living in the Society of Alexander's ["Kolmogorov Complicity and the Parable of Lightning"](https://slatestarcodex.com/2017/10/23/kolmogorov-complicity-and-the-parable-of-lightning/) (which I keep linking) in the brief period when the lightening taboo is being established, trying to make sense of everyone you know suddenly deciding, seemingly in lockstep, that thunder comes before lightning. (When you try to point out that this isn't true and no one believed it five years ago, they point out that it depends on what you mean by the word 'before'.)
 344
 345 Eventually, you would get used to it, but at first, I think this would be legitimately pretty upsetting! If you were already an emotionally fragile person, it might even escalate to a psychiatric emergency through the specific mechanism "everyone I trust is inexplicably lying about lightning → stress → sleep deprivation → temporary psychosis". That is, it's not that Society being corrupt directly causes mental ilness—that would be silly—but confronting a corrupt Society is very stressful, and that can [snowball into](https://lorienpsych.com/2020/11/11/ontology-of-psychiatric-conditions-dynamic-systems/) things like lost sleep, and sleep is [really](https://www.jneurosci.org/content/34/27/9134.short) [biologically important](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6048360/).
 346
 347 This is a pretty bad situation to be in—to be faced with the question, "Am _I_ crazy, or is _everyone else_ crazy?" But one thing that would make it slightly less bad is if you had a few allies, or even just _an_ ally—someone to confirm that the obvious answer, "It's not you," is, in fact, obvious.
 348
 349 But in a world where [everyone who's anyone](https://thezvi.wordpress.com/2019/07/02/everybody-knows/) agrees that thunder comes before lightning—including all the savvy consequentialists who realize that being someone who's anyone is an instrumentally convergent strategy for acquiring influence—anyone who would be so imprudent to take your everyone-is-lying-about-lightning concerns seriously, would have to be someone with ... a nonstandard relationship to social reality. Someone meta-savvy to the process of people wanting to be someone who's anyone. Someone who, bluntly, is kind of an asshole. Someone like—Michael Vassar!
 350
 351 From the perspective of an outside observer playing a Kolmogorov-complicity strategy, your plight might look like "innocent person suffering from mental illness in need of treatment/management", and your ally as "bad influence who is egging the innocent person on for their own unknown but probably nefarious reasons". If that outside observer chooses to draw the category boundaries of "mental illness" appropriately, that story might even be true. So why not quit making such a fuss, and accept treatment? Why fight, if fighting comes at a personal cost? Why not submit?
 352
 353 I had my answer. But I wasn't sure that Scott would understand.
 354
 355 To assess whether joining the "Vassarites" had been harmful to me, one would need to answer: as compared to what? In the counterfactual where Michael vanished from the world in 2016, I think I would have been just as upset about the same things for the same reasons, but with fewer allies and fewer ideas to make sense of what was going on in my social environment.
 356
 357 Additionally, it was really obnoxious when people had tried to use my association with Michael to try to discredit the content of what I was saying—interpreting me as Michael's pawn. Gwen, one of the "Zizians", in a blog post about her grievances against CfAR, has [a section on "Attempting to erase the agency of everyone who agrees with our position"](https://archive.is/o2gDb#attempting-to-erase-the-agency-of-everyone-who-agrees-with-our-position), complaining about how people try to cast her and Somni and Emma as Ziz's minions, rather than acknowledging that they're separate people with their own ideas who had good reasons to work together. I empathized a lot with this. My thing, and separately Ben Hoffman's [thing about Effective Altruism](http://benjaminrosshoffman.com/drowning-children-rare/), and separately Jessica's thing in the OP, didn't really have a whole lot to do with each other, except as symptoms of "the so-called 'rationalist' community is not doing what it says on the tin" (which itself wasn't a very specific diagnosis). But insofar as our separate problems did have a hypothesized common root cause, it made sense for us to talk to each other and to Michael about them.
 358
 359 Was Michael using me, at various times? I mean, probably. But just as much, _I was using him_. Particularly with [the November 2018–April 2019 thing](/2023/Jul/a-hill-of-validity-in-defense-of-meaning/) (where I and the "Vassarite" posse kept repeatedly pestering Scott and Eliezer to clarify that categories aren't arbitrary): that was the "Vassarites" doing an enormous favor for me and my agenda. (If Michael and crew hadn't had my back, I wouldn't have been anti-social enough to keep escalating.) And here Scott was trying to get away with claiming that _they_ were making my situation worse? That was absurd. Had he no shame?
 360
 361 I did, I admitted, have some specific, nuanced concerns—especially since [the December 2020 psychiatric disaster](/2023/Dec/if-clarity-seems-like-death-to-them/#a-dramatic-episode-that-would-fit-here-chronologically), with some nagging doubts beforehand—about ways in which being an inner-circle "Vassarite" might be bad for someone, but at the moment, I was focused on rebutting Scott's story, which was _silly_. A defense lawyer has an easier job than a rationalist—if the prosecution makes a terrible case, you can just destroy it, without it being your job to worry about whether your client is separately guilty of vaguely similar crimes that the incompetent prosecution can't prove.
 362
 363 Scott expressed concern that the "Vassarites" were exerting psychological pressure on me to identify with my sense of betrayal, citing accounts from [from Ziz](https://sinceriously.fyi/punching-evil/#comment-2345) ("They spent 8 hours shouting at me, gaslighting me") [and Yudkowsky](https://twitter.com/ESYudkowsky/status/1356494768960798720) ("When MichaelV and co. try to run a 'multiple people yelling at you' operation on me, I experience that as 'lol, look at all that pressure' instead _feeling pressured_").
 364
 365 I thought I recognized the purported everyone-yelling behavior pattern from the ugly conflict during [the December 2020 psychiatric crisis](/2023/Dec/if-clarity-seems-like-death-to-them/#a-dramatic-episode-that-would-fit-here-chronologically). (Jessica called me transphobic scum; Michael said that I should have never been born, that I should be contemplating suicide, that I could barely begin to make it up to the person in my care if I gave them everything I own; Jack Gallagher said that I'm only useful as an example of how bad other people should feel, if they knew what I knew.) I told Scott that the everyone-yelling thing seemed like a new innovation (that I didn't like) that they wielded as a psychological weapon only against people who they thought were operating in bad faith? That wasn't what it was like to actually be friends with them.
 366
 367 (When I shared an earlier version of this post with Michael, Ben, and Jessica, it was pointed out that "yelling" is often in the eye of the beholder: if Kimberly is acting in bad faith, and Lucas insists on pointing it out, Kimberly might motivatedly misperceive Lucas as "yelling" even if he hadn't raised his voice. I agree that this is a critical grain of salt with which such reports should be taken.)
 368
 369 In the present conversation with Scott, I had been focusing on rebutting the claim that my February–April 2017 (major) and March 2019 (minor) psych problems were caused by the "Vassarites", because with regard to those _specific_ incidents, the charge was absurd and false. But, well ... my January 2021 (minor) psych problems actually _were_ the result of being on the receiving end of (what I perceived as) "the everyone-yelling thing". (Around midnight 18–19 December 2020, I was near psychosis—there's this very distinct fear-of-Hell sensation—but because I had been there before and knew what was happening to me, and because I already knew not to take Michael literally, I was able to force myself to lie down and get some sleep and not immediately go crazy, although I did struggle for the next month: I ended up taking a week of leave off of my dayjob and got a Seroquel perscription.)
 370
 371 Scott said that based on my and others' testimony, he was updating away from Vassar being as involved in psychotic breaks than he thought, but towards thinking Vassar was worse in other ways than he thought. He felt sorry for my bad December 2020/January 2021 experience—so much that he could feel it through the triumphant vindication at getting conifrmation that the "Vassarites" were behaving badly in ways he couldn't previously prove.
 372
 373 Great, I said, I was happy to provide information to help hold people (including Michael as a particular instance of "people") accountable for the specific bad things that they're actually guilty of, rather than scapegoated as a Bad Man with mysterious witch powers.
 374
 375 I pointed out that that's exactly what one would expect if the Vassar/breakdown correlation was mostly a selection effect rather than causal—that is, if the causal graph was the fork "prone-to-psychosis ← underlying-bipolar-ish-condition → gets-along-with-Michael".
 376
 377 I had also had a sleep-deprivation-induced-psychotic-break-with-hospitalization in February 2013, and shortly thereafter, I remember Anna remarking that I was sounding a lot like Michael. But I hadn't been talking to Michael at all beforehand! (My previous email conversation with him had been in 2010.) So what could Anna's brain have been picking up on, when she said that? My guess: there was some underlying dimension of psychological variation (psychoticism? bipolar?—you tell me; this is supposed to be Scott's professional specialty) where Michael and I were already weird/crazy in similar ways, and sufficiently bad stressors could push me further along that dimension (enough for Anna to notice). Was Scott also going to blame Yudkowsky for making people [autistic](https://twitter.com/ESYudkowsky/status/1633396201427984384)?
 378
 379 Concerning the lightning parable, Scott said that from his perspective, the point of "Kolmogorov Complicity" was that, yes, people can be crazy, but that we have to live in Society without spending all our time freaking out about it. If, back in the days of my ideological anti-sexism, the first ten Yudkowsky posts I had read had said that men and women are psychologically different for biological reasons and that anyone who denies this is a mind-killed idiot—which Scott assumed Yudkowsky did think—he could imagine me being turned off. It was probably good for me and the world that that wasn't my first ten experiences of the rationalist community.
 380
 381 I agreed that this was a real concern. (I had been so enamored with Yudkowsky's philosophy-of-science writing that there was no chance of _me_ bouncing on account of the sexism that I perceived, but I wasn't the marginal case.) There are definitely good reasons to tread carefully when trying to add sensitive-in-our-culture content to Society's shared map. But I didn't think treading carefully should take precedence over _getting the goddamned right answer_.
 382
 383 As an example of what I thought treading carefully but getting the goddamned right answer looked like, I was really proud of [my April 2020 review of Charles Murray's _Human Diversity_](/2020/Apr/book-review-human-diversity/). I definitely wasn't saying, Emil Kirkegaard-style, "the black/white IQ gap is genetic, anyone who denies this is a mind-killed idiot." Rather, _first_ I reviewed the Science in the book, and _then_ I talked about the politics surrounding Murray's reputation and the technical reasons for believing that the gap is real and partly genetic, and _then_ I went meta on the problem and explained why it makes sense that political forces make this hard to talk about. I thought this was how one goes about mapping the territory without being a moral monster with respect to one's pre-Dark Enlightenment morality. (And [Emil was satisfied, too](https://twitter.com/KirkegaardEmil/status/1425334398484983813).)
 384
 385 ### Recovering from the Personality Cult (September 2021–March 2022)
 386
 387 At the end of the September 2021 Twitter altercation, I [said that I was upgrading my "mute" of @ESYudkowsky to a "block"](https://twitter.com/zackmdavis/status/1435468183268331525). Better to just leave, rather than continue to hang around in his mentions trying (consciously [or otherwise](https://www.lesswrong.com/posts/sXHQ9R5tahiaXEZhR/algorithmic-intent-a-hansonian-generalized-anti-zombie)) to pick fights, like a crazy ex-girlfriend. (["I have no underlying issues to address; I'm certifiably cute, and adorably obsessed"](https://www.youtube.com/watch?v=UMHz6FiRzS8) ...)
 388
 389 I did end up impulsively writing one more comment on one of his Facebook posts (with an aside at the top about whether that was OK), and Yudkowsky [said that Twitter looked worse for me than Facebook](/images/yudkowsky-twitter_is_worse_for_you.png)—the implication being that I _did_ still have commenting privileges as far as he was concerned. Good. I'm proud to be a crazy ex-girlfriend who knows she's crazy and _voluntarily_ deletes your number from her phone, rather than the crazy ex-girlfriend you need to block.
 390
 391 I still had more things to say—a reply to the February 2021 post on pronoun reform, and the present memoir telling this Whole Dumb Story—but those could be written and published unilaterally. Given that we clearly weren't going to get to clarity and resolution, I didn't want to bid for any more of my ex-hero's attention and waste more of his time (valuable time, _limited_ time); I still owed him for creating me.
 392
 393 Leaving a personality cult is hard. As I struggled to write, I noticed that I was wasting a lot of cycles worrying about what he'd think of me, rather than saying the things I needed to say. I knew it was pathetic that my religion was so bottlenecked on _one guy_—particularly since the holy texts themselves (written by that one guy) [explicitly said not to do that](https://www.lesswrong.com/posts/t6Fe2PsEwb3HhcBEr/the-litany-against-gurus)—but unwinding those psychological patterns was still a challenge.
 394
 395 An illustration of the psychological dynamics at play: on an August 2021 EA Forum post about demandingness objections to longtermism, Yudkowsky [commented that](https://forum.effectivealtruism.org/posts/fStCX6RXmgxkTBe73/towards-a-weaker-longtermism?commentId=Kga3KGx6WAhkNM3qY) he was "broadly fine with people devoting 50%, 25% or 75% of themselves to longtermism [...] as opposed to tearing themselves apart with guilt and ending up doing nothing much, which seem[ed] to be the main alternative."
 396
 397 I found the comment reassuring regarding the extent or lack thereof of my own contributions to the great common task—and that's the problem: I found the _comment_ reassuring, not the _argument_. It would make sense to be reassured by the claim (if true) that human psychology is such that I don't realistically have the option of devoting more than 25% of myself to the great common task. It does not make sense to be reassured that Eliezer Yudkowsky said he's broadly fine with it. That's just being a personality-cultist.
 398
 399 In January 2022, in an attempt to deal with my personality-cultist writing block, I sent him one last email asking if he particularly _cared_ if I published a couple blog posts that said some negative things about him. If he actually _cared_ about potential reputational damage to him from my writing things that I thought I had a legitimate interest in writing about, I would be _willing_ to let him pre-read the drafts before publishing and give him the chance to object to anything he thought was unfair ... but I'd rather agree that that wasn't necessary. I explained the privacy norms that I intended to follow—that I could explain _my_ actions, but had to Glomarize about the content of any private conversations that may or may not have occurred.
 400
 401 It had taken me a while (with apologies for my atrocious [sample efficiency](https://ai.stackexchange.com/a/5247)), but I was finally ready to give up on him; I thought the efficient outcome was that I should just tell my Whole Dumb Story on my blog and never bother him again. Since he probably _didn't_ particularly care (because it's not AGI alignment and therefore unimportant) and it would be psychologically easier on me if I knew he didn't hold it against me, could I please have his advance blessing to just write and publish what I was thinking so I can get it all out of my system and move on with my life?
 402
 403 If it helped—as far as _I_ could tell, I was only doing what _he_ taught me to do in 2007–2009: [carve reality at the joints](https://www.lesswrong.com/posts/esRZaPXSHgWzyB2NL/where-to-draw-the-boundaries), [speak the truth even if your voice trembles](https://www.lesswrong.com/posts/pZSpbxPrftSndTdSf/honesty-beyond-internal-truth), and [make an extraordinary effort](https://www.lesswrong.com/posts/GuEsfTpSDSbXFiseH/make-an-extraordinary-effort) when you've got [Something to Protect](https://www.lesswrong.com/posts/SGR4GxFK7KmW7ckCB/something-to-protect) (Subject: "blessing to speak freely, and privacy norms?").
 404
 405 I can't say whether he replied (because if he did, that would be covered by the privacy norm), but I think sending the email helped me. Although maybe I was wrong to ask if he wouldn't hold it against me. If you read the text of this memoir, I'm clearly holding things against _him_. If he's not my caliph anymore (with the asymmetrical duties between ruler and subject, the higher to protect and the lower to serve), and I'm entitled to my feelings, isn't he entitled to his?
 406
 407 In February 2022, I finally managed to finish a draft of ["Challenges to Yudkowsky's Pronoun Reform Proposal"](/2022/Mar/challenges-to-yudkowskys-pronoun-reform-proposal/) (A year after the post it replies to! I did other things that year, probably.) It's long (12,000 words), because I wanted to be thorough and cover all the angles. (To paraphrase Ralph Waldo Emerson, when you strike at Eliezer Yudkowsky, _you must kill him._)
 408
 409 If I had to compress it by a factor of 200 (down to 60 words), I'd say my main point was that, given a conflict over pronoun conventions, there's no "right answer", but we can at least be objective in _describing what the conflict is about_, and Yudkowsky wasn't doing that; his "simplest and best proposal" favored the interests of some parties to the dispute (as was seemingly inevitable), _without admitting he was doing so_ (which was not inevitable).[^describing-the-conflict]
 410
 411 [^describing-the-conflict]: I had been making this point for four years. [As I wrote in February 2018's "The Categories Were Made for Man to Make Predictions"](/2018/Feb/the-categories-were-made-for-man-to-make-predictions/#describing-the-conflict), "If different political factions are engaged in conflict over how to define the extension of some common word [...] rationalists may not be able to say that one side is simply right and the other is simply wrong, but we can at least strive for objectivity in _describing the conflict_."
 412
 413 In addition to prosecuting the object level (about pronouns) and the meta level (about acknowleding the conflict) for 12,000 words, I had also written _another_ several thousand words at the meta-meta level, about the political context of the argument and Yudkowsky's comments about what is "sometimes personally prudent and not community-harmful", but I wasn't sure whether to include it in the post itself, or post it as a separate comment on the _Less Wrong_ linkpost mirror, or save it for the memoir. I was worried about it being too "aggressive", attacking Yudkowsky too much, disregarding our usual norms about only attacking arguments and not people. I wasn't sure how to be aggressive and explain _why_ I wanted to disregard the usual norms in this case (why it was _right_ to disregard the usual norms in this case) without the Whole Dumb Story of the previous six years leaking in (which would take even longer to write).
 414
 415 I asked "Riley" for political advice. I thought my argumens were very strong, but that the object-level argument about pronoun conventions just wasn't very interesting; what I _actually_ wanted people to see was the thing where the Big Yud of the current year _just can't stop lying for political convenience_. How could I possibly pull that off in a way that the median _Less Wrong_-er would hear? Was it a good idea to "go for the throat" with the "I'm better off because I don't trust Eliezer Yudkowsky to tell the truth in this domain" line?
 416
 417 "Riley" said the post was super long and boring. ("Yes. I'm bored, too," I replied.) They said that I was optimizing for my having said the thing, rather than for the reader being able to hear it. In the post, I had complained that you can't have it both ways: either pronouns convey sex-category information (in which case, people who want to use natal-sex categories have an interest in defending their right to misgender), or they don't (in which case, there would be no reason for trans people to care about what pronouns people use for them). But by burying the thing I actually wanted people to see in thousands of words of boring argumentation, I was evading the fact that _I_ couldn't have it both ways: either I was calling out Yudkowsky as betraying his principles and being dishonest, or I wasn't.
 418
 419 "[I]f you want to say the thing, say it," concluded "Riley". "I don't know what you're afraid of."
 420
 421 I was afraid of taking irrevocable war actions against the person who taught me everything I know. (And his apparent conviction that the world was ending _soon_, made it worse. Wouldn't it feel petty, if the last thing you ever said to your grandfather was calling him a liar in front of the whole family, even if he had in fact lied?)
 422
 423 I wanted to believe that if I wrote all the words dotting every possible _i_ and crossing every possible _t_ at all three levels of meta, then that would make it [a description and not an attack](http://benjaminrosshoffman.com/can-crimes-be-discussed-literally/)—that I could have it both ways if I explained the lower level of organization beneath the high-level abstractions of "betraying his principles and being dishonest." If that didn't work because [I only had five words](https://www.lesswrong.com/posts/4ZvJab25tDebB8FGE/you-have-about-five-words), then—I didn't know what I'd do. I'd think about it.
 424
 425 After a month of dawdling, I eventually decided to pull the trigger on publishing "Challenges", without the extended political coda.[^coda] The post was a little bit mean to Yudkowsky, but not so mean that I was scared of the social consequences of pulling the trigger. (Yudkowsky had been mean to Christiano and Richard Ngo and Rohin Shah in [the recent MIRI dialogues](https://www.lesswrong.com/s/n945eovrA3oDueqtq); I didn't think this was worse than that.)
 426
 427 [^coda]: The text from the draft coda would later be incorporated into the present memoir.
 428
 429 I cut the words "in this domain" from the go-for-the-throat concluding sentence that I had been worried about. "I'm better off because I don't trust Eliezer Yudkowsky to tell the truth," full stop.
 430
 431 The post was a _critical success_ by my accounting, due to eliciting a [a highly-upvoted (110 karma at press time) comment by _Less Wrong_ administrator Oliver Habryka](https://www.lesswrong.com/posts/juZ8ugdNqMrbX7x2J/challenges-to-yudkowsky-s-pronoun-reform-proposal?commentId=he8dztSuBBuxNRMSY) on the _Less Wrong_ mirror. Habryka wrote:
 432
 433 > [...] basically everything in this post strikes me as "obviously true" and I had a very similar reaction to what the OP says now, when I first encountered the Eliezer Facebook post that this post is responding to.
 434 >
 435 > And I do think that response mattered for my relationship to the rationality community. I did really feel like at the time that Eliezer was trying to make my map of the world worse, and it shifted my epistemic risk assessment of being part of the community from "I feel pretty confident in trusting my community leadership to maintain epistemic coherence in the presence of adversarial epistemic forces" to "well, I sure have to at least do a lot of straussian reading if I want to understand what people actually believe, and should expect that depending on the circumstances community leaders might make up sophisticated stories for why pretty obviously true things are false in order to not have to deal with complicated political issues".
 436 >
 437 > I do think that was the right update to make, and was overdetermined for many different reasons, though it still deeply saddens me.
 438
 439 Brutal! Recall that Yudkowsky's justification for his behavior had been that "it is sometimes personally prudent and _not community-harmful_ to post your agreement with Stalin" (emphasis mine), and here we had the administrator of Yudkowsky's _own website_ saying that he's deeply saddened that he now expects Yudkowsky to _make up sophisticated stories for why pretty obviously true things are false_ (!!).
 440
 441 Is that ... _not_ evidence of harm to the community? If that's not community-harmful in Yudkowsky's view, then what would be example of something that _would_ be? _Reply, motherfucker!_
 442
 443 ... or rather, "Reply, motherfucker", is what I fantasized about being able to say, if I hadn't already expressed an intention not to bother him anymore.
 444
 445 ### The Death With Dignity Era (April 2022)
 446
 447 On 1 April 2022, Yudkowsky published ["MIRI Announces New 'Death With Dignity' Strategy"](https://www.lesswrong.com/posts/j9Q8bRmwCgXRYAgcJ/miri-announces-new-death-with-dignity-strategy), a cry of despair in the guise of an April Fool's Day post. MIRI didn't know how to align a superintelligence, no one else did either, but AI capabilities work was continuing apace. With no credible plan to avert almost-certain doom, the most we could do now was to strive to give the human race a more dignified death, as measured in log-odds of survival: an alignment effort that doubled the probability of a valuable future from 0.0001 to 0.0002 was worth one information-theoretic bit of dignity.
 448
 449 In a way, "Death With Dignity" isn't really an update. Yudkowsky had always refused to name a "win" probability, while maintaining that Friendly AI was ["impossible"](https://www.lesswrong.com/posts/nCvvhFBaayaXyuBiD/shut-up-and-do-the-impossible). Now, he says the probability is approximately zero.
 450
 451 Paul Christiano, who has a much more optimistic picture of humanity's chances, nevertheless said that he liked the "dignity" heuristic. I like it, too. It—takes some of the pressure off. I [made an analogy](https://www.lesswrong.com/posts/j9Q8bRmwCgXRYAgcJ/miri-announces-new-death-with-dignity-strategy?commentId=R59aLxyj3rvjBLbHg): your plane crashed in the ocean. To survive, you must swim to shore. You know that the shore is west, but you don't know how far. The optimist thinks the shore is just over the horizon; we only need to swim a few miles and we'll probably make it. The pessimist thinks the shore is a thousand miles away and we will surely die. But the optimist and pessimist can both agree on how far we've swum up to this point, and that the most dignified course of action is "Swim west as far as you can."