+"The thing about deception is, you can't just lie about the one thing. Everything is connected to each other in the Great Web of Causality. If you lie about one thing, you also have to lie about the evidence pointing to that thing, and the evidence pointing to that evidence, recursively covering up the coverups. For example ..." she trailed off. "Sorry, I didn't rehearse this; maybe you can think of an example."
+
+Jake's heart stopped. She had to be toying with him, right? Indeed, Jake could think of an example. By his count, he was now three layers deep into his stack of coverups and coverups-of-coverups (by writing the bell character bug, attributing it to Code Assistant, and overwriting the incriminating videos in the object storage cluster with puppies). Four, if you counted pretending to give a shit about "AI safety". But now he was done ... right?
+
+No! Not quite, he realized. He had overwritten the videos, but the object metadata would still show them with a last-modified timestamp of Friday evening (when he had gotten his puppy-overwriting script working), not the timestamp of their actual creation (which Chloë had from the reverse-proxy logs). That wouldn't directly implicate him (the way the videos of Elaine calling him by name would), but it would show that whoever had exploited the bell character bug was _covering their tracks_ (as opposed to just wanting puppy videos in the first place).
+
+Maybe Chloë wouldn't notice the timestamps? He couldn't count on that; the logging discrepancy that started this whole fiasco was much subtler.
+
+But the object storage API probably provided a way to edit the metadata and update the last-modified time, right? (The analogue of `touch -d` on Unix-like systems.) This shouldn't even count as a fourth–fifth coverup; it was something he should have included in his script to write the puppy videos.
+
+"Sorry, I'm not sure what you mean," he said to Chloë, as he brought up the object storage API docs on his laptop. If she noticed the change in his manner, he didn't notice her noticing.
+
+"Okay, so [this example comes from Paul](https://www.greaterwrong.com/posts/AqsjZwxHNqH64C2b6/let-s-see-you-write-that-corrigibility-tag/comment/8kPhqBc69HtmZj6XR)," she said. Jake felt a small flicker of distaste towards the practice of referring to researchers by their first name (which he had seen a lot of in the blog posts he had read over the weekend, and which struck him as uncouth), but mentioning it wouldn't be a smart play.
+
+"Suppose your household robot misbehaves. Say it accidentally breaks your favorite vase."
+
+"'Accidentally'? Isn't that anthropomorphizing?" Jake asked. A questionable play—his attentive student role didn't call for that much skepticism, but he was somewhat distracted by his docs-search multitasking and forgot to avoid reacting naturally.
+
+"Sorry, that's inessential—and really, in the most worrisome scenarios, it would be intentional. What matters is that the robot wants—I mean, is optimizing for—your approval, and it knows that you would disapprove if you knew what it had done.
+
+We can imagine a spectrum of possible resposnes. Given that the deed was done, you'd prefer that it fess up, tell you about the vase.
+
+But if could also try to hide evidence.
+
+[...]
+
+SGD has to discover these policies "continuously", one gradient update at a time.
+
+The honest policy and the deceptive policy