-"If the system is trained to pass rigorous evaluations, a deceptive policy has to do a lot more work, different work, to pass the evaluations," Chloë said. "In the limit, that could mean—using Multigen-like capabilities to make videos to convince you that the cat is there while you're on vacation? Constructing a realistic cat-like robot to fool you when you get back? Maybe this isn't the best illustrative example. The point is, small, 'shallow' deceptions aren't stable. The set of policies that do well on evaluations comes in two disconnected parts: the part that tells the truth, and the part that—not just lies, but, um—"
+"So if the system is trained to pass rigorous evaluations, a deceptive policy has to do a lot more work, different work, to pass the evaluations," Chloë said. "Maybe it buys a new, similar-looking vase to put in the same place, and forges the payment memo to make it look like the purchase was for cleaning supplies, and so on. The point is, small, 'shallow' deceptions aren't stable. The set of policies that do well on evaluations comes in two disconnected parts: the part that tells the truth, and the part that—not just lies, but, um—"