Suppose that, even among the very few exceptions that aren't all-zeros or all-ones, the first bit is _always_ in the majority and is never "flipped": you can have exceptions that "look like" `00000100000000000000` or `11011111111101111011`, but never `10000000000000000000` or `01111111111111111111`.
Then if you wanted an efficient encoding to talk about the two and only two _clusters_ of bitstrings—the mostly-zeros (a majority of `00000000000000000000` plus a few exceptions with a few bits flipped) and the mostly-ones (a majority of `11111111111111111111` plus a few exceptions with a few bits flipped)—you might want to use the first bit as the "definition" for your codewords—even if most of the various [probabilistic inferences that you wanted to make](https://www.lesswrong.com/posts/3nxs2WYDGzJbzcLMp/words-as-hidden-inferences) [on the basis of cluster-membership](https://www.lesswrong.com/posts/gDWvLicHhcMfGmwaK/conditional-independence-and-naive-bayes) concerned bits other than the first. The majoritarian first bit, even if you don't care about it in itself, is a [_simple_ membership test](https://www.lesswrong.com/posts/edEXi4SpkXfvaX42j/schelling-categories-and-simple-membership-tests) for the mostly-zeros/mostly-ones category system.
Suppose that, even among the very few exceptions that aren't all-zeros or all-ones, the first bit is _always_ in the majority and is never "flipped": you can have exceptions that "look like" `00000100000000000000` or `11011111111101111011`, but never `10000000000000000000` or `01111111111111111111`.
Then if you wanted an efficient encoding to talk about the two and only two _clusters_ of bitstrings—the mostly-zeros (a majority of `00000000000000000000` plus a few exceptions with a few bits flipped) and the mostly-ones (a majority of `11111111111111111111` plus a few exceptions with a few bits flipped)—you might want to use the first bit as the "definition" for your codewords—even if most of the various [probabilistic inferences that you wanted to make](https://www.lesswrong.com/posts/3nxs2WYDGzJbzcLMp/words-as-hidden-inferences) [on the basis of cluster-membership](https://www.lesswrong.com/posts/gDWvLicHhcMfGmwaK/conditional-independence-and-naive-bayes) concerned bits other than the first. The majoritarian first bit, even if you don't care about it in itself, is a [_simple_ membership test](https://www.lesswrong.com/posts/edEXi4SpkXfvaX42j/schelling-categories-and-simple-membership-tests) for the mostly-zeros/mostly-ones category system.